Jako vstupní zdroje dat byla použita již transformovaná data od
zdravotních pojišťoven ČPZP a OZP upravená do datábázového formátu a
uložena v lokální DuckDB databázi.
Vstupní soubory byly očištěny o předpisy skupiny L04, které
by eventuelně mohly sloužit k samostatné analýze, ale nedají se
jednoznačně označit jako substituty kortikoidsteroidů.
Během vyhodnocování byli vždy vybráni pouze ti klienti, u kterých máme jistotu, že jsou pojištěni po celé sledované období. Pokud by noví klienti nebyli vyfiltrováni, hrozilo by výrazné zkreslení prvopředpisů a zkreslení celkového trendu způsobené fluktuací. Pokud klient v průběhu sledování zemřel, je bráno datum úmrtí jako datum konce pojištění. V případě dlouhodobého vyhodnocování jsou tedy vybráni klienti s počátkem pojištění před rokem 2015 a bez konce pojištění. Pro krátkodobé vyhodnocení (SCCS) byli vybráni klienti, kteří měli platné pojištění minimálně +- 365 dní od rozhodného data (vrchol vakcinace dané kohorty plus jeden týden).
Věk klienta je počítán k roku 2021, pokud není uvedeno jinak (např. U výpočtu rizika prvopředpisu podle věku).
Jako očkovaní klienti byli vybráni ti s alespoň jedním záznamem o vakcinaci a jako datum vakcinace bylo zvoleno datum první dávky.
Předpisy injekcí / infuzí byly agregovány do jednoho předpisu pouze u krátkodobé analýzy (SCCS). V ostatních případech sledujeme buď celkový trend (kde je výkyv v počtu předpisů způsobený větším množstvím injekcí také jeden z možných výsledků) nebo prvopředpisy (kde je množství irelevantní).
Při posuzování časových řad pomocí ETS modelu (příp. pomocí STL s lineární regresí) je posuzován nastavený trend do roku 2020 proti reálnému průběhu po tomto roce. Cíl tohoto přístupu je identifikovat, jestli měl COVID-19 nějaký vliv na celkový trend a případně, v jaké skupině dochází k největší odchylce.
V rámci analýz dlouhodobého dopadu jsou porovnávána data 2018-2020 proti datům z let 2022 a 2023. Smysl tohoto přístupu je odhalit, jestli měl COVID-19 (a potažmo očkování) nějaký vliv na období po relativním uklidnění. Data z let 2020 a 2021 jsou také vyřazena z důvodu nestability a výrazných sezónních propadů.
Pro dlouhodobé dopady byly použity 3 metriky, které je možné i částečně vyhodnotit vedle sebe. Vždy je bráno v úvahu, že pokud dochází k měření určitého stavu, musí mít všichni stejnou příležitost k dosažení daného stavu. Pro konkrétní metriky tedy:
Prvopředpisovost: Data z počátku měření vykazují jako prvopředpis v zásadě každý předpis. To je samozřejmě chybný odhad a tedy jsou ze všech prvopředpisových metrik vyjmuty roky 2015 a 2016. Zároveň pokud je míra prvopředpisovosti stále identická, tak celkový úhrn prvopředpisů v čase klesá (nikdo nemůže dostat prvopředpis dvakrát). Z toho důvodu je prvopředpisovost vždy počítaná jako počet prvopředpisů na 1000 dosud nepředepisovaných osob.
Multipředpisovost: Namísto celkového úhrnu předpisů je počítán poměr osob s více než jedním předpisem za zvolené období (měsíc nebo rok) na 1000 předepisovaných osob. Tento přístup umožní nejen vysledovat samotnou míru zvýšení či snížení, ale i odhadnout o jaký typ léčby jde.
Smysl by dávalo měřit i celkový počet předpisů na 1000 osob všeobecně a počty unikátních osob v daném období. Na obojí má ale výrazný vliv retence (tedy stav, kdy předepsaný klient dostává předpisy i ve všech dalších sledovaných časových intervalech) a vzhledem k tomu, že analýza je postavená na uzavřené skupině, kde nedochází k žádné fluktuaci klientů, tak by bylo nutné obě tyto metriky očistit o přirozenou míru udržení (kterou nelze jednoznačně odhadnout) a celkově by takové měření bylo náročné na interpretaci.
Krátkodobá analýza aplikuje metodu SCCS na prvopředpisy a veškeré předpisy. Smysl tohoto přístupu je odhalit bezprostřední rizika vakcinace a z pohledu celé analýzy je tento postup nejdůležitějším pro posouzení i z toho důvodu, že v kratším časovém horizontu lze předpokládat nižší vliv zavádějících proměnných způsobených časem.
Kontrolní součty z pohledu dlouhodobého sledování:
433 923 ČPZP 646 862106 160 ČPZP 248 729671 453358 346251 194 (včetně L04), 176 973 (H02)145 195 (včetně
L04), 144 793 (H02)prescriptions <- dbGetQuery(con,
"
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
2021 - c.birthyear AS age_2021,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.presc_order ASC) AS first_presc_date,
p.date AS presc_date,
v.date AS vacc_date,
d.drug_form,
d.ATC_group,
d.Prednison_equiv AS prednison_equiv,
d.force AS strength,
'CPZP' AS ins_company
FROM
cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN cpzp_prescriptions p ON c.client_id = p.client_id
LEFT JOIN cpzp_vaccinations v ON c.client_id = v.client_id AND event_order = 1
LEFT JOIN cpzp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date = '9999-12-31'
UNION
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
2021 - c.birthyear AS age_2021,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
p.date AS presc_date,
v.date AS vacc_date,
d.drug_form,
d.ATC_group,
d.Prednison_equiv AS prednison_equiv,
d.strength,
'OZP' AS ins_company
FROM
ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN ozp_prescriptions p ON c.client_id = p.client_id
LEFT JOIN ozp_vaccinations v ON c.client_id = v.client_id AND event_order = 1
LEFT JOIN ozp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date IS NULL
") %>%
mutate(client_id = as.numeric(factor(client_id)),
vacc_yearmonth = floor_date(vacc_date, unit = "month"),
presc_yearmonth = floor_date(presc_date, unit = "month"),
vaccinated = ifelse(n_vacc > 0, "Vakcinace", "Bez vakcinace"),
sex = ifelse(sex == "Z", "F", sex),
first_presc_flag = ifelse(!is.na(presc_date) & first_presc_date == presc_date,1,0)) %>%
mutate(age_class = case_when(
age_2021 >= 5 & age_2021 <= 11 ~ "5-11",
age_2021 >= 12 & age_2021 <= 15 ~ "12-15",
age_2021 >= 16 & age_2021 <= 29 ~ "16-29",
age_2021 >= 30 & age_2021 <= 34 ~ "30-34",
age_2021 >= 35 & age_2021 <= 39 ~ "35-39",
age_2021 >= 40 & age_2021 <= 44 ~ "40-44",
age_2021 >= 45 & age_2021 <= 49 ~ "45-49",
age_2021 >= 50 & age_2021 <= 54 ~ "50-54",
age_2021 >= 55 & age_2021 <= 59 ~ "55-59",
age_2021 >= 60 & age_2021 <= 64 ~ "60-64",
age_2021 >= 65 & age_2021 <= 69 ~ "65-69",
age_2021 >= 70 & age_2021 <= 79 ~ "70-79",
age_2021 >= 80 ~ "80+",
TRUE ~ NA_character_
),
age_class = factor(age_class, levels = c(
"5-11", "12-15", "16-29", "30-34", "35-39", "40-44",
"45-49", "50-54", "55-59", "60-64", "65-69", "70-79", "80+"
))
)Na první pohled je viditelná odchylka od trendu během roku 2020 a následný strmější nárůst. Zároveň je také zřejmá vyšší míra předpisovosti u žen. Nižší počet předpisů u neočkované populace je jednoduše vysvětlitelný nižším počtem neočkovaných a pravděpodobně i jiným přístupem k prevenci.
total_presc_by_month <- prescriptions %>%
filter(!is.na(presc_yearmonth)) %>%
group_by(presc_yearmonth) %>%
summarise(total_prescriptions = n()) %>%
ungroup()
ggplot(total_presc_by_month, aes(x = presc_yearmonth, y = total_prescriptions)) +
geom_line(color = "steelblue", linewidth = 1, alpha = 0.6) +
stat_smooth(method = "loess", span = 0.3, se = TRUE, linewidth = 1.5, color = "firebrick") +
scale_x_date(
date_labels = "%Y",
date_breaks = "1 year",
expand = expansion(mult = c(0.01, 0.01))
) +
labs(
title = "Celkový úhrn předpisů",
x = "Rok",
y = "Počet předpisů"
) +
theme_minimal(base_size = 14) +
theme(
axis.text.x = element_text(color = "black", size = 12, face = "bold"),
plot.title = element_text(face = "bold", size = 16, hjust = 0.5)
)grouped_presc <- prescriptions %>%
filter(!is.na(presc_yearmonth)) %>%
group_by(presc_yearmonth, sex, vaccinated) %>%
summarise(total_prescriptions = n()) %>%
ungroup()
ggplot(grouped_presc, aes(x = presc_yearmonth, y = total_prescriptions,
color = vaccinated, linetype = sex)) +
geom_vline(xintercept = as.numeric(as.Date("2020-12-01")), linetype = "dotted", color = "gray70", linewidth = 0.4) +
geom_line(linewidth = 1, alpha = 0.8) +
stat_smooth(method = "loess", span = 0.3, se = TRUE, linewidth = 1.5, aes(color = vaccinated), linetype = "solid") +
scale_x_date(
date_labels = "%Y",
date_breaks = "1 year",
expand = expansion(mult = c(0.01, 0.01))
) +
scale_color_manual(values = c("Vakcinace" = "darkgreen", "Bez vakcinace" = "firebrick")) +
labs(
title = "Předpisy podle pohlaví a vakcinace",
x = "Rok",
y = "Počet předpisů",
color = "Vakcinace",
linetype = "Pohlaví"
) +
theme_minimal(base_size = 14) +
theme(
axis.text.x = element_text(color = "black", size = 12, face = "bold"),
plot.title = element_text(face = "bold", size = 16, hjust = 0.5)
)Zde ze zřejmých důvodů není použit věk v roce 2021, ale věk v roce prvopředpisu. Riziko je počítané jako podíl prvopředpisů na počet osob v daném věku, které ještě žádný předpis neměly. Vyřazené jsou prvopředpisy v roce 2015 a 2016, které budou pravděpodobně vykazovat vysokou chybovost u lidí předepisovaných sezónně. Distribuce z logických důvodů (nemáme k dispozici veškerá historická data pro každého klienta) částečně kopíruje populační křivku, výsledek je tedy nutné brát s jistou rezervou.
first_presc_dynamic_age <- dbGetQuery(con,
"
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
'CPZP' AS ins_company
FROM
cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN cpzp_prescriptions p ON c.client_id = p.client_id AND p.presc_order = 1
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date = '9999-12-31'
UNION
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
'OZP' AS ins_company
FROM
ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN ozp_prescriptions p ON c.client_id = p.client_id AND p.presc_order = 1
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date IS NULL
") %>%
mutate(first_presc_yearmonth = floor_date(first_presc_date, unit = "month"),
sex = ifelse(sex == "Z", "F", sex),
vacc_status = ifelse(n_vacc > 0, "Vakcinace", "Bez vakcinace"),
age_at_presc = year(first_presc_date) - birthyear)
obs_start <- 2017
obs_end <- 2023
max_age <- 80
filtered_data <- first_presc_dynamic_age %>%
mutate(
year_of_presc = year(first_presc_date),
age_of_presc = year_of_presc - birthyear,
age_at_start = obs_start - birthyear,
age_at_end = obs_end - birthyear
) %>%
filter(age_at_end >= 0) %>%
filter(age_at_start <= max_age)
age_seq <- 0:max_age
at_risk_by_age <- lapply(age_seq, function(a) {
n <- filtered_data %>%
filter(is.na(age_of_presc) | age_of_presc > a) %>%
n_distinct(.$client_id)
data.frame(age = a, n_at_risk = n)
}) %>%
bind_rows()
first_presc_by_age <- filtered_data %>%
filter(!is.na(age_of_presc), year(first_presc_date) >= obs_start) %>%
count(age = age_of_presc, name = "n_first_presc") %>%
filter(age <= max_age)
age_analysis <- left_join(at_risk_by_age, first_presc_by_age, by = "age") %>%
mutate(
n_first_presc = replace_na(n_first_presc, 0),
prop_first_presc = n_first_presc / n_at_risk
)
ggplot(age_analysis, aes(x = age, y = prop_first_presc * 1000)) +
geom_line(color = "steelblue") +
geom_point(aes(color = n_first_presc)) +
labs(
title = "Riziko prvopředpisu podle věku",
x = "Věk",
y = "Počet prvopředpisů na 1000 osob bez předpisu",
color = "Počet prvopředpisů"
) +
theme_minimal()assign_age_class <- function(birthyear) {
age <- 2021 - birthyear
cut(
age,
breaks = c(-Inf, 4, 11, 15, 29, 34, 39, 44, 49, 54, 59, 64, 69, 79, Inf),
labels = c(
"<5",
"5-11",
"12-15",
"16-29",
"30-34",
"35-39",
"40-44",
"45-49",
"50-54",
"55-59",
"60-64",
"65-69",
"70-79",
"80+"
),
right = TRUE,
ordered_result = TRUE
)
}
vaccination_first_shot_by_birthyear <- dbGetQuery(con,
"
SELECT DISTINCT
v.date,
c.birthyear,
COUNT(DISTINCT v.client_id) OVER (
PARTITION BY v.date, c.birthyear) AS vaccinated_total_in_day,
'OZP' AS ins_company
FROM ozp_vaccinations v
LEFT JOIN ozp_clients c ON v.client_id = c.client_id
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
WHERE
v.event_order = 1
AND i.ins_start_date < '2020-01-01' AND i.ins_end_date > '2023-01-01'
UNION
SELECT DISTINCT
v.date,
c.birthyear,
COUNT(DISTINCT v.client_id) OVER (
PARTITION BY v.date, c.birthyear) AS vaccinated_total_in_day,
'CPZP' AS ins_company
FROM cpzp_vaccinations v
LEFT JOIN cpzp_clients c ON v.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
WHERE
v.event_order = 1
AND i.ins_start_date < '2020-01-01' AND i.ins_end_date > '2023-01-01'
"
) %>%
group_by(date, birthyear) %>%
summarise(vaccinated_total_in_day = sum(vaccinated_total_in_day, na.rm = TRUE)) %>%
ungroup() %>%
mutate(age_class = assign_age_class(birthyear)) %>%
filter(age_class != "<5")
pop_by_age_class <- dbGetQuery( con,
"
SELECT DISTINCT
birthyear,
COUNT(DISTINCT c.client_id) OVER (PARTITION BY c.birthyear) AS clients_count,
'OZP' AS ins_company
FROM ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
WHERE i.ins_start_date < '2020-01-01' AND i.ins_end_date > '2023-01-01'
UNION
SELECT DISTINCT
birthyear,
COUNT(DISTINCT c.client_id) OVER (PARTITION BY c.birthyear) AS clients_count,
'CPZP' AS ins_company
FROM cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
WHERE i.ins_start_date < '2020-01-01' AND i.ins_end_date > '2023-01-01'
"
) %>%
group_by(birthyear) %>%
summarise(clients_count = sum(clients_count, na.rm = TRUE)) %>%
ungroup() %>%
mutate(age_class = assign_age_class(birthyear)) %>%
group_by(age_class) %>%
summarise(population = sum(clients_count, na.rm = TRUE),
.groups = "drop") %>%
filter(age_class != "<5")
vaccination_start_date <- pop_by_age_class %>%
select(age_class) %>%
arrange(desc(age_class)) %>%
mutate(vaccination_start_date =
as.Date(
c(
"2021-01-15",
"2021-03-01",
"2021-04-14",
"2021-04-23",
"2021-04-28",
"2021-05-05",
"2021-05-10",
"2021-05-17",
"2021-05-24",
"2021-05-26",
"2021-06-04",
"2021-07-01",
"2021-12-13"
)
))Věkové kategorie jsou pro účely sledování proočkovanosti rozřazeny podle harmonogramu zahájení vakcinace. V některých případech byla část populace vakcinovaná dříve, pokud bylo zpřístupnění zahájeno na základě jiného kritéria než věkového (např. personál zdravotnických zařízení). Tyto věkové kategorie jsou následně použité i později při detailních analýzách.
vaccination_start_date_cz <- vaccination_start_date %>% filter(age_class != "<5")
colnames(vaccination_start_date_cz) <- c("Věková kategorie", "Datum zpřístupnění vakcinace")
vaccination_start_date_cz %>%
kable("html", caption = "Harmonogram zahájení vakcinace") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE)| Věková kategorie | Datum zpřístupnění vakcinace |
|---|---|
| 80+ | 2021-01-15 |
| 70-79 | 2021-03-01 |
| 65-69 | 2021-04-14 |
| 60-64 | 2021-04-23 |
| 55-59 | 2021-04-28 |
| 50-54 | 2021-05-05 |
| 45-49 | 2021-05-10 |
| 40-44 | 2021-05-17 |
| 35-39 | 2021-05-24 |
| 30-34 | 2021-05-26 |
| 16-29 | 2021-06-04 |
| 12-15 | 2021-07-01 |
| 5-11 | 2021-12-13 |
Je zřejmé, že každá věková kategorie se chovala v přístupu k očkování jinak. Patrný trend je v logice “Čím starší populace, tím vyšší proočkovanost”.
vaccinated_first_shot_cumulative_ratio <- vaccination_first_shot_by_birthyear %>%
group_by(age_class, date) %>%
summarise(
vaccinated_in_day = sum(vaccinated_total_in_day, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(age_class, date) %>%
group_by(age_class) %>%
mutate(cumulative_vaccinated = cumsum(vaccinated_in_day)) %>%
ungroup() %>%
inner_join(pop_by_age_class, by = "age_class") %>%
mutate(vaccinated_ratio = cumulative_vaccinated / population)
ggplot(
vaccinated_first_shot_cumulative_ratio %>%
filter(date <= as.Date("2022-01-01")),
aes(x = date, y = vaccinated_ratio, color = age_class)
) +
geom_line(size = 1) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(
title = "Kumulativní proočkovanost dle první dávky vakcíny",
x = "Datum",
y = "Pročkovanost",
color = "Věková kategorie"
) +
theme_minimal()vaccination_cumulative_relative <- vaccinated_first_shot_cumulative_ratio %>%
left_join(vaccination_start_date, by = "age_class") %>%
mutate(days_prior_to_vaccination_start_date = as.integer(date - vaccination_start_date))
vaccination_cumulative_relative %>%
filter(date <= as.Date("2022-01-01")) %>%
ggplot(aes(x = days_prior_to_vaccination_start_date, y = vaccinated_ratio, color = age_class)) +
geom_line(size = 1) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(
title = "Kumulativní proočkovanost",
subtitle = "Podle doby od zahájení vakcinace pro danou kategorii",
x = "Dnů od zahájení vakcinace",
y = "Proočkovanost",
color = "Věková kategorie"
) +
theme_minimal(base_size = 13)vaccinated_weekly_smooth <- vaccinated_first_shot_cumulative_ratio %>%
filter(age_class != "5-11") %>%
arrange(age_class, date) %>%
group_by(age_class) %>%
mutate(
vaccinated_7d_avg = zoo::rollmean(
vaccinated_in_day / population,
k = 7,
fill = NA,
align = "right"
)
) %>%
ungroup() %>%
left_join(vaccination_start_date, by = "age_class")
ggplot(
vaccinated_weekly_smooth %>%
filter(date <= as.Date("2022-01-01")),
aes(x = date, y = vaccinated_7d_avg)
) +
geom_line(color = "steelblue", size = 0.6) +
geom_vline(aes(xintercept = vaccination_start_date), color = "red", linetype = "dashed", size = 0.5) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
facet_wrap(~ age_class, scales = "free_y") +
labs(
title = "Sedmidenní průměr denní vakcinace",
subtitle = "S počátkem vakcinace (červená)",
x = "Datum",
y = "% Vakcinováno"
) +
theme_minimal(base_size = 12) +
theme(
strip.text = element_text(face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
)Na základě míry kumulativní proočkovanosti podle doby od zahájení vakcinace lze spočítáním eukleidovských vzdáleností efektivně rozřadit věkové kategorie do širších kohort. Tyto kohorty lze později využít u analýz, které vyžadují větší množství dat (např. SCCS).
Výrazný rozdíl u skupiny 5-11 je způsoben tím, že
očkování bylo zpřístupněno později a sledované období končí v roce
2022.
wide_corr_data <- vaccination_cumulative_relative %>%
select(age_class,
days_prior_to_vaccination_start_date,
vaccinated_ratio) %>%
pivot_wider(names_from = age_class, values_from = vaccinated_ratio)
mat <- wide_corr_data %>% select(-days_prior_to_vaccination_start_date) %>% as.matrix()
dist_mat <- dist(t(mat), method = "euclidean")
dist_mat_full <- as.matrix(dist_mat)
color_palette <- viridis(100)
pheatmap(
dist_mat_full,
color = color_palette,
display_numbers = TRUE,
number_color = "black",
cluster_rows = TRUE,
cluster_cols = TRUE,
fontsize_number = 8,
fontsize = 10,
angle_col = 45,
legend = TRUE,
legend_breaks = seq(0, max(dist_mat_full), length.out = 5),
legend_labels = round(seq(0, max(dist_mat_full), length.out = 5), 2),
border_color = "grey60"
)Je nutné vzít v úvahu fakt, že samotný čas je jako prediktor obecně nedostatečný a výsledky tohoto pozorování slouží spíše k účelům nasměrování k dalšímu zkoumání.
Pro předpověď vývoje dat po roce 2020 na základě předchozích dat byla použita metoda ets R knihovny forecast.
K jednoduše vysvětlitelnému propadu dochází na začátku roku 2020 a poté k nárůstu oproti očekávaným hodnotám, který nelze vysvětlit jen vyvážením předchozího lokálního extrému. Zajímavostí je, že sezónní minima jsou předpovězena s vysokou přesností a k největším odchylkám dochází u sezónních maxim.
prescriptions_clean <- prescriptions %>%
filter(!is.na(presc_date)) %>%
mutate(presc_date = as.Date(presc_date)) %>%
arrange(presc_date) %>%
filter(presc_date >= as.Date("2015-01-01") & presc_date <= as.Date("2023-12-31")) %>%
mutate(count = 1)
monthly_data <- prescriptions_clean %>%
mutate(month = floor_date(presc_date, "month")) %>%
group_by(month) %>%
summarise(prescriptions = sum(count), .groups = 'drop')train_data <- monthly_data %>% filter(month < as.Date("2020-01-01"))
test_data <- monthly_data %>% filter(month >= as.Date("2020-01-01"))
ts_train <- ts(train_data$prescriptions, start = c(2015, 1), frequency = 12)
fit <- ets(ts_train)
n_ahead <- nrow(test_data)
forecast_result <- forecast(fit, h = n_ahead)
comparison <- test_data %>%
mutate(
predicted = forecast_result$mean,
residual = prescriptions - predicted
)ggplot() +
geom_line(data = monthly_data, aes(x = month, y = prescriptions), color = "blue", size = 1, alpha = 0.6) +
geom_line(data = comparison, aes(x = month, y = predicted), color = "red", size = 1, linetype = "dashed") +
geom_vline(xintercept = as.Date("2020-01-01"), linetype = "dotted") +
labs(title = "Realita vs predikované hodnoty (ETS)",
x = "Datum", y = "Počet předpisů")ggplot(comparison, aes(x = month, y = residual)) +
geom_line(color = "steelblue", size = 1) +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Odchylky 2020+", x = "Datum", y = "Odchylka")Pro otestování významnosti změny úhrnu byl proveden t-test mezi průměrnými odchylkami od ETS modelu pro data před a po roce 2020. Rozdíl je statisticky významný.
res_train <- residuals(fit)
res_test <- test_data$prescriptions - as.numeric(forecast_result$mean)
residuals_df <- bind_rows(
tibble(month = train_data$month, residual = res_train, period = "train"),
tibble(month = test_data$month, residual = res_test, period = "test")
)
ttest <- t.test(
residuals_df$residual[residuals_df$period == "train"],
residuals_df$residual[residuals_df$period == "test"]
)
cat("```\n", paste(capture.output(ttest), collapse = "\n"), "\n```")
Welch Two Sample t-test
data: residuals_df$residual[residuals_df$period == "train"] and residuals_df$residual[residuals_df$period == "test"]
t = -3.4167, df = 47, p-value = 0.001317
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2118.0197 -548.1661
sample estimates:
mean of x mean of y
-1.347016e-03 1.333092e+03
Pro analýzu užších skupin byla použita drobně odlišná metodika. V první iteraci je postup téměř identický jako v původním případu. Pro analýzu samotného trendu jsou ale data nejprve očištěna o sezónnost a šum pomocí metody stl a poté linearizována prostou lineární regresí. Tu lze zaprvé porovnat s očištěnými daty pro zjištění kvality modelu a zadruhé je změna trendu jednoduše patrná porovnáním sklonu každé z přímek.
Pro evaluaci odchylek je u neočištěných dat použito relativní RMSE
(rRMSE), které je spočítáno jako RMSE po roce 2020 vydělené
aritmetickým průměrem průběhu do roku 2020. To poté znázorňuje relativní
odchylku od původních dat.
U druhého přístupu umožňuje linerání regrese i vyhodnocení kvality
modelu. Toho je docíleno výpočtem koeficientu determinace pro data do
roku 2022 (r_squared) a p-hodnoty samotného modelu
(p_value_lm).
analyze_group <- function(df, group_id, group_type) {
monthly_data <- df %>%
mutate(month = floor_date(presc_date, "month")) %>%
group_by(month) %>%
summarise(prescriptions = sum(count), .groups = 'drop') %>%
arrange(month)
train_data <- monthly_data %>% filter(month < as.Date("2020-01-01"))
test_data <- monthly_data %>% filter(month >= as.Date("2020-01-01"))
if (nrow(train_data) < 24 || nrow(test_data) < 12) {
return(NULL)
}
train_records <- df %>% filter(presc_date < as.Date("2020-01-01")) %>% nrow()
ts_train <- ts(train_data$prescriptions, start = c(year(min(train_data$month)), month(min(train_data$month))), frequency = 12)
fit <- tryCatch(ets(ts_train, model = "ZZM"), error = function(e) NULL)
if (is.null(fit)) return(NULL)
forecast_result <- forecast(fit, h = nrow(test_data))
residuals_train <- residuals(fit)
residuals_test <- test_data$prescriptions - as.numeric(forecast_result$mean)
ttest <- t.test(residuals_train, residuals_test)
rmse_train <- sqrt(mean(residuals_train^2))
rmse <- sqrt(mean(residuals_test^2))
p_train <- residuals(fit)
mean_train <- mean(train_data$prescriptions)
rrmse <- rmse / mean_train
plot_data <- bind_rows(
train_data %>% mutate(predicted = as.numeric(fitted(fit))),
test_data %>% mutate(predicted = as.numeric(forecast_result$mean))
) %>%
mutate(group = group_id, group_type = group_type)
tibble(
group = group_id,
group_type = group_type,
train_records = train_records,
rmse_train = rmse_train,
rmse = rmse,
rrmse = rrmse,
t_stat = ttest$statistic,
p_value = ttest$p.value,
mean_train_resid = mean(residuals_train),
mean_test_resid = mean(residuals_test),
plot_data = list(plot_data)
)
}
analyze_group_trend <- function(df, group_id, group_type) {
monthly_data <- df %>%
mutate(month = floor_date(presc_date, "month")) %>%
group_by(month) %>%
summarise(prescriptions = sum(count), .groups = "drop") %>%
arrange(month)
if (nrow(monthly_data) < 36) return(NULL)
ts_full <- ts(monthly_data$prescriptions,
start = c(year(min(monthly_data$month)), month(min(monthly_data$month))),
frequency = 12)
stl_fit <- stl(ts_full, s.window = "periodic", robust = TRUE)
trend_full <- stl_fit$time.series[, "trend"]
trend_df <- tibble(
month = monthly_data$month,
trend = as.numeric(trend_full)
) %>%
mutate(
time_index = 1:n(),
post_2020 = month >= as.Date("2020-01-01")
)
pre_df <- filter(trend_df, !post_2020)
post_df <- filter(trend_df, post_2020)
lm_pre <- lm(trend ~ time_index, data = pre_df)
lm_post <- lm(trend ~ time_index, data = post_df)
lm_pre_summary <- summary(lm_pre)
pre_p_value <- lm_pre_summary$coefficients[2, 4]
pre_r_squared <- lm_pre_summary$r.squared
post_df <- post_df %>%
mutate(
extrapolated_pre_trend = predict(lm_pre, newdata = post_df),
actual_post_trend = predict(lm_post, newdata = post_df),
residuals = actual_post_trend - extrapolated_pre_trend
)
rmse_post <- sqrt(mean(post_df$residuals^2))
mean_pre_trend <- mean(pre_df$trend)
rrmse_post <- rmse_post / mean_pre_trend
ttest <- t.test(post_df$residuals)
trend_df <- trend_df %>%
mutate(
pre_fit = predict(lm_pre, newdata = trend_df),
post_fit = if_else(post_2020, predict(lm_post, newdata = trend_df), NA_real_)
)
tibble(
group = group_id,
group_type = group_type,
n_records = nrow(df),
slope_pre = coef(lm_pre)["time_index"],
slope_post = coef(lm_post)["time_index"],
p_value_pre = pre_p_value,
r_squared_pre = pre_r_squared,
rmse_post = rmse_post,
rrmse_post = rrmse_post,
mean_residual = mean(post_df$residuals),
t_stat = ttest$statistic,
p_value = ttest$p.value,
plot_data = list(trend_df)
)
}results_vacc <- prescriptions_clean %>%
filter(!is.na(vaccinated)) %>%
group_by(vaccinated) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$vaccinated)
analyze_group(df = filter(prescriptions_clean, vaccinated == group_val),
group_id = group_val,
group_type = "Vaccination")
}) %>%
compact() %>%
bind_rows()
results_sex <- prescriptions_clean %>%
filter(!is.na(sex)) %>%
group_by(sex) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$sex)
analyze_group(df = filter(prescriptions_clean, sex == group_val),
group_id = group_val,
group_type = "Sex")
}) %>%
compact() %>%
bind_rows()
results_all <- bind_rows(results_vacc, results_sex) %>%
arrange(p_value)
plot_data_all <- results_all %>%
unnest(plot_data, names_sep = "_")ETS model vykazuje největší rozdíl u nevakcinované skupiny a u mužů. T-test rozdílu odchylek je v obou případech statisticky významný.
results_all %>%
mutate(Skupina = group, RMSE = rmse, rRMSE = rrmse) %>%
select(
Skupina, RMSE, rRMSE, t_stat, p_value
) %>%
kable(format = "html", caption = "Porovnání předpisovosti v čase (ETS)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Skupina | RMSE | rRMSE | t_stat | p_value |
|---|---|---|---|---|
| M | 1396.8746 | 0.1692402 | -5.4883544 | 0.0000016 |
| Bez vakcinace | 696.6278 | 0.1651938 | -4.4321352 | 0.0000557 |
| F | 1759.1357 | 0.1127051 | -1.4953911 | 0.1414972 |
| Vakcinace | 2068.4206 | 0.1052896 | 0.0251487 | 0.9800428 |
ggplot(plot_data_all, aes(x = plot_data_month)) +
geom_line(aes(y = plot_data_prescriptions), color = "blue") +
geom_line(aes(y = plot_data_predicted), color = "red", linetype = "dashed") +
geom_vline(xintercept = as.Date("2020-01-01"), linetype = "dotted") +
facet_wrap(~ paste(plot_data_group_type, plot_data_group, sep = ": "), scales = "free_y") +
labs(
title = "Realita vs Předpověď (ETS)",
subtitle = "Modrá = Realita, Červená = Předpověď",
x = "Datum", y = "Počet předpisů"
) +
theme_minimal()Linearizovaný trend vykazuje nárůst od předpokladu ve všech kategoriích. Největší rozdíl je opět patrný u skupiny neočkovaných a mužů. U nevakcinované populace je nutné vzít v úvahu velmi nízký koeficient determinace a tedy i nízké prediktivní schopnosti samotného modelu.
results_vacc <- prescriptions_clean %>%
filter(!is.na(vaccinated)) %>%
group_by(vaccinated) %>%
group_split() %>%
map(~ analyze_group_trend(.x, group_id = unique(.x$vaccinated), group_type = "Vaccinated")) %>%
compact() %>%
bind_rows()
results_sex <- prescriptions_clean %>%
filter(!is.na(sex)) %>%
group_by(sex) %>%
group_split() %>%
map(~ analyze_group_trend(.x, group_id = unique(.x$sex), group_type = "Sex")) %>%
compact() %>%
bind_rows()
results_all <- bind_rows(results_vacc, results_sex)
plot_data_all <- results_all %>%
unnest(plot_data)results_all %>%
mutate(Skupina = group, RMSE = rmse_post, rRMSE = rrmse_post, r_squared = r_squared_pre, p_value_diff = p_value, p_value_lm = p_value_pre) %>%
select(
Skupina, RMSE, rRMSE, p_value_lm, r_squared
) %>%
kable(format = "html", caption = "Porovnání předpisovosti v čase (linearizovaný trend)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Skupina | RMSE | rRMSE | p_value_lm | r_squared |
|---|---|---|---|---|
| Bez vakcinace | 565.2184 | 0.1337668 | 0.0004409 | 0.1932531 |
| Vakcinace | 1042.0182 | 0.0528575 | 0.0000000 | 0.9306673 |
| F | 860.4047 | 0.0549194 | 0.0000000 | 0.9032496 |
| M | 765.4342 | 0.0924169 | 0.0000000 | 0.8175001 |
ggplot(plot_data_all, aes(x = month)) +
geom_line(aes(y = trend), color = "black", size = 0.8) +
geom_line(aes(y = pre_fit), color = "blue", size = 1.2, linetype = "dashed") +
geom_line(aes(y = post_fit), color = "red", size = 1.2, linetype = "dashed") +
geom_vline(xintercept = as.Date("2020-01-01"), linetype = "solid", color = "gray40") +
facet_wrap(~ paste(group_type, group, sep = ": "), scales = "free_y") +
labs(
title = "Linearizované trendy v předpisovosti",
subtitle = "Modrá = Linearizovaný trend do 2020; Červená = Linearizovaný trend po 2020",
x = "Month",
y = "Počet předpisů"
) +
theme_minimal() +
theme(
strip.text = element_text(face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
)Pro porovnání věkových kategorií byla použita stejná metodika jako u skupin podle vakcinace a pohlaví.
results_age <- prescriptions_clean %>%
filter(!is.na(age_class)) %>%
group_by(age_class) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$age_class)
analyze_group(df = filter(prescriptions_clean, age_class == group_val),
group_id = group_val,
group_type = "Age Class")
}) %>%
compact() %>%
bind_rows()
results_age <- results_age %>%
arrange(group)
plot_data_age <- results_age %>%
unnest(plot_data, names_sep = "_")K největšímu relativnímu rozdílu dochází u skupin 5-11,
12-15 a 16-29. To částečně vysvětluje i
předchozí porovnání, kde největší rozdíl vykazovala skupina
nevakcinovaných. Rozdíl reziduál je však významný jen u skupiny
16-29. Skupiny 60-64 a 65-69
vykazují také signifikantní odchylku od očekávaných hodnot.
results_age %>%
mutate(Skupina = group, RMSE = rmse, rRMSE = rrmse) %>%
select(
Skupina, RMSE, rRMSE, t_stat, p_value
) %>%
kable(format = "html", caption = "Porovnání předpisovosti v čase (ETS)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Skupina | RMSE | rRMSE | t_stat | p_value |
|---|---|---|---|---|
| 5-11 | 116.56860 | 0.2761308 | 0.1210174 | 0.9041932 |
| 12-15 | 60.02150 | 0.2807804 | -1.1662273 | 0.2494064 |
| 16-29 | 238.75694 | 0.2524881 | -4.8869924 | 0.0000123 |
| 30-34 | 87.74207 | 0.1384490 | 7.4119505 | 0.0000000 |
| 35-39 | 106.79332 | 0.1268053 | -2.5359254 | 0.0145986 |
| 40-44 | 208.35234 | 0.1473983 | -4.2292198 | 0.0001075 |
| 45-49 | 290.60068 | 0.1308266 | -4.2824696 | 0.0000905 |
| 50-54 | 242.86926 | 0.1050056 | 0.3906080 | 0.6978519 |
| 55-59 | 262.05711 | 0.0990215 | 0.6870654 | 0.4954179 |
| 60-64 | 312.48431 | 0.1194154 | -4.6045264 | 0.0000316 |
| 65-69 | 495.21694 | 0.1727451 | -4.7074821 | 0.0000224 |
| 70-79 | 618.10440 | 0.1224329 | 0.1086331 | 0.9139560 |
| 80+ | 253.20135 | 0.1508588 | 5.8897066 | 0.0000004 |
ggplot(plot_data_age, aes(x = plot_data_month)) +
geom_line(aes(y = plot_data_prescriptions), color = "blue") +
geom_line(aes(y = plot_data_predicted), color = "red", linetype = "dashed") +
geom_vline(xintercept = as.Date("2020-01-01"), linetype = "dotted") +
facet_wrap(~ plot_data_group, scales = "free_y", ncol = 2) +
labs(
title = "Realita vs Předpověď (ETS)",
subtitle = "Modrá = Realita, Červená = Předpověď",
x = "Datum", y = "Počet předpisů"
) +
theme_minimal()results_age <- prescriptions_clean %>%
filter(!is.na(age_class)) %>%
group_by(age_class) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$age_class)
analyze_group_trend(df = filter(prescriptions_clean, age_class == group_val),
group_id = group_val,
group_type = "Age Class")
}) %>%
compact() %>%
bind_rows()
results_age <- results_age %>%
arrange(group)
plot_data_age <- results_age %>%
unnest(plot_data)Porovnání linearizovaných trendů víceméně potvrzuje některé
předpoklady nalezené v předchozí analýze. Bohužel mladší skupiny (pod 15
let) mají natolik výrazné výkyvy i po očištění o sezónnost, že není
možné jejich průběh linearizovat s vysokým koeficientem determinace.
Skupina 16-29 již nevykazuje tak výraznou odchylku od
předpokladu a u starší populace vykazuje výrazný rozdíl jen
65-69.
results_age %>%
mutate(Skupina = group, RMSE = rmse_post, rRMSE = rrmse_post, r_squared = r_squared_pre, p_value_diff = p_value, p_value_lm = p_value_pre) %>%
select(
Skupina, RMSE, rRMSE, p_value_lm, r_squared
) %>%
kable(format = "html", caption = "Porovnání předpisovosti v čase (linearizovaný trend)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Skupina | RMSE | rRMSE | p_value_lm | r_squared |
|---|---|---|---|---|
| 5-11 | 223.26393 | 0.5500878 | 0.0000000 | 0.6525649 |
| 12-15 | 95.46844 | 0.4596474 | 0.0000000 | 0.6818070 |
| 16-29 | 65.26420 | 0.0689561 | 0.0000000 | 0.9517284 |
| 30-34 | 41.46398 | 0.0652822 | 0.0000000 | 0.9574665 |
| 35-39 | 43.78888 | 0.0518640 | 0.0000000 | 0.9903928 |
| 40-44 | 105.50875 | 0.0746866 | 0.0000000 | 0.9401286 |
| 45-49 | 239.89668 | 0.1077107 | 0.0000000 | 0.9644207 |
| 50-54 | 155.70665 | 0.0672456 | 0.0000000 | 0.9816383 |
| 55-59 | 130.29275 | 0.0491745 | 0.0000000 | 0.9537575 |
| 60-64 | 117.37329 | 0.0447723 | 0.0000000 | 0.8245083 |
| 65-69 | 341.38374 | 0.1183617 | 0.0000000 | 0.5065845 |
| 70-79 | 192.17282 | 0.0379005 | 0.0000000 | 0.8453986 |
| 80+ | 103.83404 | 0.0616256 | 0.1008976 | 0.0457235 |
ggplot(plot_data_age, aes(x = month)) +
geom_line(aes(y = trend), color = "black", size = 0.8) +
geom_line(aes(y = pre_fit), color = "blue", size = 1.2, linetype = "dashed") +
geom_line(aes(y = post_fit), color = "red", size = 1.2, linetype = "dashed") +
geom_vline(xintercept = as.Date("2020-01-01"), linetype = "solid", color = "gray40") +
facet_wrap(~ group, scales = "free_y", ncol = 2) +
labs(
title = "Linearizované trendy v předpisovosti",
subtitle = "Modrá = Linearizovaný trend do 2020; Červená = Linearizovaný trend po 2020",
x = "Month",
y = "Počet předpisů"
) +
theme_minimal() +
theme(
strip.text = element_text(face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
)Tato úvodní analýza poskytla vhodný startovací bod pro následné
zkoumání konkrétních věkových kohort. Konkrétní skupiny vybrané pro
další zkoumání tedy jsou: 12-15 z důvodu největšího výkyvu
(skupina je ale bohužel natolik malá, že pravděpodobně nebude možné
dojít k relevantním závěrům), 16-29, 30-39 a
40-59. Skupina 5-11 je vyřazena z několika
důvodů. Prvním je nižší proočkovanost a druhým je fakt, že nejmladší
populace této skupiny je na začátku dlouhodobého meření buď nenarozená
nebo čerstvě na světě a může zde probíhat mnoho výkyvů. Skupina
60-69 by jistě také poskytla zajímavé výsledky, ale z
časových důvodů a z důvodu zadání soutěže (orientace především na mladší
populaci) byla prozatím také vyřazena.
Detailní analýza je rozdělená na dvě části. Dlouhodobou a krátkodobou. Zatímco dlouhodobá analýza porovnává trendy v období před covidem (<2020) vs po covidu (>=2022), tak krátkodobá analýza sleduje efekty v horizontu jednoho roku od vakcíny s porovnáním stejně dlouhého období před vakcinací.
Pokud dochází k výkyvu celkové předpisovosti, tak musí zákonitě
docházet i ve výkyvu buď v prvopředpisovosti, multipředpisovosti nebo
obojím. Tyto dvě metriky jsou v dlouhodobé analýze měřené na 1000 osob v
riziku, aby byly alespoň částečně porovnatelné. Pro prvopředpisovost
tedy počet prvopředpisů na 1000 dosud nepředepsaných osob a pro
multipředpisovost poměr multipředpisovaných osob v daném období na 1000
předpisovaných osob (období jsou roční a měsíční). Zároveň je také
sledovaná míra retence (tedy kolik procent prvopředepisovaných bylo
předepsaných i v následujícíh letech). Z těchto naměřených hodnot je
následně vypočten poměr po vs před (a t-test na rozdíl průměrných
hodnot) a statistická významnost rozdílu mezi očkovanými a neočkovanými
je vyhodnocena pomocí p-hodnoty pro postCOVID:vaccinated
lineárního modelu
log(RatePer1000) ~ postCOVID * vaccinated, kde proměnné
postCOVID a vaccinated nabývají hodnot pouze 1
a 0.
Pro krátkodobé efekty je použita metoda SCCS,
konkrétně tedy standardsccs(). Očkovaní a neočkovaní jsou
zde porovnáni s vlastní baseline 365 dní před vakcinací a následně jsou
posouzena rizika prvopředpisů a předpisů celkem po vakcíně. Rizika jsou
následně porovnána mezi očkovanými a neočkovanými. V případě
neočkovaných je datum vakcinace nastaveno jako rozhodné datum v dané
skupině. Skupina očkovaných vybraná pro analýzu je podle vakcinace
(první dávka) +- 7 dní od rozhodného data. Do analýzy jsou oproti všem
ostatním přístupům zahrnuti klienti, kteří byli pojištěni po dobu +-
jednoho roku od vakcinace (v ostatních přístupech jsou zahrnuti klienti
pojištěni minimálně po celých 8 let) a multipředpisy injekcí / infuzí v
rozsahu 10 dnů jsou evidovány jako pouze jeden předpis, aby měla každá
událost zhruba stejnou váhu. Období po podání vakcíny byla nastavena
jako 0-29, 30-89, 90-179, 180-269 a 270-365 dní. Kratší intervaly
bohužel často vykazovaly nižší množství dat a tedy insignifikantní
výsledky. Skupiny 16-29 a 30-39 zde byly
sloučeny do jedné z důvodu nízké relevance výstupů u samostatné
evaluace. Pro testování rozdílů mezi skupinami byl použit Waldův test na
rozdíl log(Hazard Ratio) v rámci DiD analýzy.
age_cohorts <- vaccination_start_date %>%
arrange(age_class)
age_cohorts <- age_cohorts[2:9 , ]
age_cohorts <- age_cohorts %>%
mutate(
cohort = c(rep("12-15", 1), rep("16-39", 3), rep("40-59", 4))
)
vaccination_peaks <- vaccinated_first_shot_cumulative_ratio %>%
inner_join(age_cohorts, by = "age_class") %>%
group_by(cohort, date) %>%
summarise(vaccinated_in_day = sum(vaccinated_in_day, na.rm = TRUE), .groups = "drop") %>%
arrange(cohort, date) %>%
group_by(cohort) %>%
mutate(vaccinated_in_day_ma7 = zoo::rollmean(vaccinated_in_day, k = 7, fill = NA, align = "right")) %>%
ungroup()
peak_dates <- vaccination_peaks %>%
group_by(cohort) %>%
filter(vaccinated_in_day == max(vaccinated_in_day, na.rm = TRUE)) %>%
slice(1) %>%
ungroup() %>%
select(cohort, peak_date = date) %>%
mutate(decisive_date = peak_date + days(7))Z důvodu vrcholu očkování krátce po zahájení je rozhodné datum nastavené o 7 dní později než datum vrcholu.
age_cohorts %>%
group_by(cohort) %>%
summarise(vaccination_start_date = max(vaccination_start_date)) %>%
left_join(peak_dates, by = "cohort") %>%
transmute(Kohorta = cohort, `Datum zahájení vakcinace` = vaccination_start_date, `Vrchol vakcinace` = peak_date, `Rozhodné datum` = decisive_date) %>%
kable(format = "html", caption = "Zahájení a vrcholy vakcinace dle kohorty") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Kohorta | Datum zahájení vakcinace | Vrchol vakcinace | Rozhodné datum |
|---|---|---|---|
| 12-15 | 2021-07-01 | 2021-07-28 | 2021-08-04 |
| 16-39 | 2021-06-04 | 2021-06-08 | 2021-06-15 |
| 40-59 | 2021-05-17 | 2021-05-14 | 2021-05-21 |
ggplot(vaccination_peaks, aes(x = date, y = vaccinated_in_day_ma7)) +
geom_line(color = "steelblue") +
facet_wrap(~ cohort, scales = "free_y") +
geom_vline(data = peak_dates, aes(xintercept = as.numeric(decisive_date)), color = "red", linetype = "dashed") +
labs(title = "Sedmidenní průměr denní vakcinace a vrcholy vakcinace",
x = "Datum", y = "Vakcinovaných za den") +
theme_minimal()Ačkoliv je věková distribuce podle vakcinace mírně odlišná, neměla by mít výrazný vliv z pohledu rozdílných rizik podle stáří.
unique_clients <- prescriptions %>%
filter(!is.na(age_2021), !is.na(sex), !is.na(vaccinated)) %>%
distinct(client_id, .keep_all = TRUE)
age_data <- unique_clients %>%
group_by(vaccinated, age_2021) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(vaccinated) %>%
mutate(prop = count / sum(count),
variable = "Age",
value = as.character(age_2021)) %>%
ungroup()
sex_data <- unique_clients %>%
group_by(vaccinated, sex) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(vaccinated) %>%
mutate(prop = count / sum(count),
variable = "Sex",
value = sex) %>%
ungroup()ggplot(unique_clients, aes(x = age_2021, fill = factor(vaccinated))) +
geom_histogram(aes(y = after_stat(density)), position = "identity", alpha = 0.4, binwidth = 1) +
scale_fill_brewer(palette = "Set1", name = "Vaccinated") +
scale_x_continuous(breaks = seq(floor(min(unique_clients$age)), ceiling(max(unique_clients$age)), by = 1)) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Věkové rozložení podle statusu vakcinace",
x = "Věk", y = "%",
fill = "Vakcinace"
) +
theme_minimal()ggplot(sex_data, aes(x = sex, y = prop, color = vaccinated, group = vaccinated)) +
geom_line(size = 1) +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Pohlaví podle vakcinace",
x = "Pohlaví",
y = "%",
color = "Vakcinace"
) +
theme_minimal()ETS model vyhodnocuje rozdíl v nastaveném trendu zejména u ženské a neočkované populace. Tyto rozdíly po vs před rokem 2020 jsou statisticky významné.
prescriptions_clean <- prescriptions %>%
filter(!is.na(presc_date)) %>%
mutate(presc_date = as.Date(presc_date)) %>%
arrange(presc_date) %>%
filter(presc_date >= as.Date("2015-01-01") & presc_date <= as.Date("2023-12-31")) %>%
mutate(count = 1)
results_vacc <- prescriptions_clean %>%
filter(!is.na(vaccinated)) %>%
group_by(vaccinated) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$vaccinated)
analyze_group(df = filter(prescriptions_clean, vaccinated == group_val),
group_id = group_val,
group_type = "Vaccination")
}) %>%
compact() %>%
bind_rows()
results_sex <- prescriptions_clean %>%
filter(!is.na(sex)) %>%
group_by(sex) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$sex)
analyze_group(df = filter(prescriptions_clean, sex == group_val),
group_id = group_val,
group_type = "Sex")
}) %>%
compact() %>%
bind_rows()
results_all <- bind_rows(results_vacc, results_sex) %>%
arrange(p_value)
plot_data_all <- results_all %>%
unnest(plot_data, names_sep = "_")results_all %>%
mutate(Skupina = group, RMSE = rmse, rRMSE = rrmse) %>%
select(
Skupina, RMSE, rRMSE, t_stat, p_value
) %>%
kable(format = "html", caption = "Porovnání předpisovosti v čase (ETS)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Skupina | RMSE | rRMSE | t_stat | p_value |
|---|---|---|---|---|
| Bez vakcinace | 47.67759 | 0.4239264 | -5.4861625 | 0.0000016 |
| F | 36.48232 | 0.4145718 | -3.1989369 | 0.0024708 |
| M | 32.17169 | 0.2558046 | 1.5519815 | 0.1273749 |
| Vakcinace | 33.29083 | 0.3286361 | -0.8040151 | 0.4254362 |
ggplot(plot_data_all, aes(x = plot_data_month)) +
geom_line(aes(y = plot_data_prescriptions), color = "blue") +
geom_line(aes(y = plot_data_predicted), color = "red", linetype = "dashed") +
geom_vline(xintercept = as.Date("2020-01-01"), linetype = "dotted") +
facet_wrap(~ paste(plot_data_group_type, plot_data_group, sep = ": "), scales = "free_y") +
labs(
title = "Realita vs Předpověď (ETS)",
subtitle = "Modrá = Realita, Červená = Předpověď",
x = "Datum", y = "Počet předpisů"
) +
theme_minimal()Všeobecná prvopředpisovost je na celé populaci mírně vyšší v období po covidu.
first_presc <- dbGetQuery(con,
"
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
'CPZP' AS ins_company
FROM
cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN cpzp_prescriptions p ON c.client_id = p.client_id AND p.presc_order = 1
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date = '9999-12-31'
AND 2021 - c.birthyear BETWEEN 12 AND 15
UNION
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
'OZP' AS ins_company
FROM
ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN ozp_prescriptions p ON c.client_id = p.client_id AND p.presc_order = 1
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date IS NULL
AND 2021 - c.birthyear BETWEEN 12 AND 15
") %>%
mutate(first_presc_yearmonth = floor_date(first_presc_date, unit = "month"),
sex = ifelse(sex == "Z", "F", sex),
vacc_status = ifelse(n_vacc > 0, "Vakcinace", "Bez vakcinace"),
age_2021 = 2021 - birthyear)
monthly_first_presc <- first_presc %>%
count(first_presc_yearmonth, name = "new_prescriptions") %>%
arrange(first_presc_yearmonth) %>%
mutate(
cum_new_presc = cumsum(new_prescriptions),
total_population = nrow(first_presc),
at_risk_population = total_population - lag(cum_new_presc, default = 0),
rate = new_prescriptions / at_risk_population
) %>%
mutate(
period_group = case_when(
first_presc_yearmonth < as.Date("2020-01-01") ~ "Pre-COVID",
first_presc_yearmonth > as.Date("2021-12-31") ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
monthly_clean <- monthly_first_presc %>%
filter(first_presc_yearmonth >= as.Date("2018-01-01")) %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
ttest <- t.test(rate ~ period_group, data = monthly_clean)ggplot(monthly_first_presc %>% filter(first_presc_yearmonth >= as.Date("2018-01-01")),
aes(x = first_presc_yearmonth, y = rate * 1000)) +
geom_line(color = "steelblue", size = 1) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Počet prvopředpisů na 1000 nepředepsaných klientů", limits = c(0, 2)) +
labs(
title = "Prvopředpisovost v čase",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum"
) +
theme_minimal()
Welch Two Sample t-test
data: rate by period_group
t = 2.9636, df = 44.846, p-value = 0.004853
alternative hypothesis: true difference in means between group Post-COVID and group Pre-COVID is not equal to 0
95 percent confidence interval:
5.783436e-05 3.032637e-04
sample estimates:
mean in group Post-COVID mean in group Pre-COVID
0.001248165 0.001067616
total_by_vacc <- first_presc %>%
count(vacc_status, name = "total_population")
monthly_by_vacc <- first_presc %>%
count(vacc_status, first_presc_yearmonth, name = "new_prescriptions") %>%
arrange(vacc_status, first_presc_yearmonth) %>%
group_by(vacc_status) %>%
mutate(
cum_prescriptions = cumsum(new_prescriptions),
total_population = total_by_vacc$total_population[match(vacc_status, total_by_vacc$vacc_status)],
at_risk_population = total_population - lag(cum_prescriptions, default = 0),
rate = new_prescriptions / at_risk_population
) %>%
ungroup() %>%
mutate(
period_group = case_when(
first_presc_yearmonth < as.Date("2020-01-01") ~ "Pre-COVID",
first_presc_yearmonth > as.Date("2021-12-31") ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
monthly_by_vacc_clean <- monthly_by_vacc %>%
filter(first_presc_yearmonth >= as.Date("2018-01-01")) %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
first_presc_table <- monthly_by_vacc_clean %>%
group_by(vacc_status) %>%
summarise(
p_value = t.test(rate ~ period_group)$p.value,
mean_before = mean(rate[period_group == "Pre-COVID"]),
mean_after = mean(rate[period_group == "Post-COVID"]),
rate = mean_after / mean_before
)Prvopředpisovost po covidu vykazuje nárůst u obou kategorií, v případě očkovaných však vyšší a statisticky významný.
first_presc_table %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before * 1000, `Průměr po` = mean_after * 1000, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání prvopředpisovosti v čase podle vakcinace") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.1763358 | 1.010252 | 1.094448 | 1.083342 |
| Vakcinace | 0.0007757 | 1.132464 | 1.423175 | 1.256706 |
ggplot(monthly_by_vacc %>% filter(first_presc_yearmonth >= as.Date("2018-01-01")),
aes(x = first_presc_yearmonth, y = rate * 1000, color = vacc_status)) +
geom_line(size = 1) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Prescriptions per 1,000 at-risk clients", limits = c(0, 2.5)) +
labs(
title = "Prvopředpisovost v čase podle vakcinace",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum",
color = "Vakcinace"
) +
theme_minimal()DiD model ovšem statisticky významný rozdíl mezi skupinami nepotvrzuje (p hodnota pro interakci vaccinated:post_covid).
monthly_rates <- monthly_by_vacc %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID")) %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(first_presc_yearmonth >= as.Date("2022-01-01"), 1, 0),
interaction = vaccinated * post_covid)
did_model <- lm(log(rate * 1000) ~ vaccinated * post_covid, data = monthly_rates)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(rate * 1000) ~ vaccinated * post_covid, data = monthly_rates)
Residuals:
Min 1Q Median 3Q Max
-0.80598 -0.26332 0.00375 0.17636 1.17803
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.22949 0.04683 4.900 2.28e-06 ***
vaccinated 0.09257 0.06623 1.398 0.1641
post_covid -0.15655 0.08762 -1.787 0.0758 .
vaccinated:post_covid 0.16884 0.12391 1.363 0.1749
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3628 on 164 degrees of freedom
Multiple R-squared: 0.05497, Adjusted R-squared: 0.03768
F-statistic: 3.18 on 3 and 164 DF, p-value: 0.02555
presc_id_counts <- dbGetQuery(con,
"
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
2021 - c.birthyear AS age_2021,
p.date as presc_date,
p.presc_order,
'CPZP' AS ins_company,
YEAR(FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.presc_order ASC)) AS first_presc_year
FROM
cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN cpzp_prescriptions p ON c.client_id = p.client_id
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date = '9999-12-31'
AND 2021 - c.birthyear BETWEEN 12 AND 15
UNION
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
2021 - c.birthyear AS age_2021,
p.date as presc_date,
p.presc_order,
'OZP' AS ins_company,
YEAR(FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.presc_order ASC)) AS first_presc_year
FROM
ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN ozp_prescriptions p ON c.client_id = p.client_id
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date IS NULL
AND 2021 - c.birthyear BETWEEN 12 AND 15
")%>%
mutate(presc_year = as.integer(format(as.Date(presc_date), "%Y")),
vacc_status = ifelse(n_vacc > 0, "Vakcinace", "Bez vakcinace"))
monthly_unique_clients <- presc_id_counts %>%
mutate(yearmonth = floor_date(presc_date, unit = "month")) %>%
distinct(client_id, yearmonth, vacc_status) %>%
count(yearmonth, vacc_status, name = "n_unique_clients")
presc_years <- presc_id_counts %>%
mutate(
presc_year = year(presc_date)
) %>%
distinct(client_id, first_presc_year, presc_year, vacc_status) %>%
filter(first_presc_year >= 2017, first_presc_year <= 2023)
retention_data <- presc_years %>%
mutate(year_offset = presc_year - first_presc_year) %>%
filter(year_offset >= 0)
retention_summary <- retention_data %>%
group_by(first_presc_year, year_offset, vacc_status) %>%
summarise(n_clients = n_distinct(client_id), .groups = "drop")
cohort_sizes <- retention_data %>%
filter(year_offset == 0) %>%
group_by(first_presc_year, vacc_status) %>%
summarise(cohort_size = n_distinct(client_id), .groups = "drop")
retention_summary <- retention_summary %>%
left_join(cohort_sizes, by = c("first_presc_year", "vacc_status")) %>%
mutate(retention_rate = n_clients / cohort_size)
retention_summary_filtered <- retention_summary %>%
filter(year_offset > 0)Vzhledem k tomu, že roční retence lze sledovat jen v několikaletém horizontu, nemáme dostatek relevantních dat pro posouzení vývoje po vs před covidem. Všeobecně je ale míra retence srovnatelná.
ggplot(retention_summary_filtered, aes(x = year_offset, y = retention_rate, color = as.factor(first_presc_year))) +
geom_line(size = 1.2) +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_color_viridis_d(name = "Rok prvopředpisu") +
labs(
title = "Retence (meziroční předpisovost) podle vakcinace",
x = "Počet let od prvopředpisu",
y = "Míra multipředpisovosti"
) +
facet_wrap(~ vacc_status) +
theme_minimal(base_size = 13)multi_presc_yearly <- multi_presc %>%
mutate(year = year(date)) %>%
filter(year > 2017)
multi_clients_yearly <- multi_presc_yearly %>%
group_by(year, vacc_status, client_id) %>%
summarise(presc_count = n(), .groups = "drop") %>%
mutate(is_multi = presc_count > 1)
multi_summary_yearly <- multi_clients_yearly %>%
group_by(year, vacc_status) %>%
summarise(
total_clients = n(),
multi_clients = sum(is_multi),
multi_rate = (multi_clients / total_clients) * 1000,
.groups = "drop"
)
multi_summary_yearly <- multi_summary_yearly %>%
mutate(
period_group = case_when(
year < 2020 ~ "Pre-COVID",
year > 2021 ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
multi_summary_clean <- multi_summary_yearly %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
t_test_results <- multi_summary_clean %>%
group_by(vacc_status) %>%
summarise(
p_value = t.test(multi_rate ~ period_group)$p.value,
mean_before = mean(multi_rate[period_group == "Pre-COVID"]),
mean_after = mean(multi_rate[period_group == "Post-COVID"]),
rate = mean_after / mean_before,
.groups = "drop"
)V obou skupinách je možné sledovat nárůst. Ten však ani v jednom případě není statisticky významný. Zajímavé je však sledovat klesavou tendenci u neočkované populace mezi lety 2022 a 2023, zatímco u očkované populace je trend neustále rostoucí.
t_test_results %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before, `Průměr po` = mean_after, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání roční multipředpisovosti v čase") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.2265163 | 145.2588 | 193.5101 | 1.332175 |
| Vakcinace | 0.0917678 | 139.7487 | 183.4569 | 1.312763 |
ggplot(multi_summary_yearly, aes(x = year, y = multi_rate, color = vacc_status)) +
geom_line(size = 0.6, alpha = 0.4) +
stat_smooth(method = "loess", se = FALSE, size = 1.2) +
annotate("rect",
xmin = 2020, xmax = 2021,
ymin = -Inf, ymax = Inf,
alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Roční multipředpisovost na 1000 předepsaných klientů") +
scale_x_continuous(breaks = unique(multi_summary_yearly$year)) +
labs(
title = "Roční multipředpisovost podle vakcinace",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Rok",
color = "Vakcinace"
) +
theme_minimal()Rozdíl v rozdílech nárůstů je také vysoko nad přijatelnou hladinou statistické významnosti.
did_data <- multi_summary_clean %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(year >= 2022, 1, 0),
interaction = vaccinated * post_covid
)
did_model <- lm(log(multi_rate) ~ vaccinated * post_covid, data = did_data)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(multi_rate) ~ vaccinated * post_covid, data = did_data)
Residuals:
1 2 3 4 5 6 7 8
-0.04517 -0.03659 0.04517 0.03659 -0.10415 -0.05379 0.10415 0.05379
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.97750 0.06542 76.083 1.79e-07 ***
vaccinated -0.03832 0.09252 -0.414 0.7000
post_covid 0.28242 0.09252 3.053 0.0379 *
vaccinated:post_covid -0.01106 0.13084 -0.085 0.9367
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09252 on 4 degrees of freedom
Multiple R-squared: 0.8212, Adjusted R-squared: 0.6871
F-statistic: 6.123 on 3 and 4 DF, p-value: 0.05625
Neočkovaná skupina vykazuje po covidovém období téměř 34% nárůst. Růst je také znatelný ve skupině očkovaných, ten však není statisticky signifikantní.
t_test_results %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before, `Průměr po` = mean_after, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání měsíční multipředpisovosti v čase") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.0066103 | 78.75462 | 105.67934 | 1.341881 |
| Vakcinace | 0.5186993 | 89.80643 | 95.56067 | 1.064074 |
ggplot(multi_summary_monthly, aes(x = year_month, y = multi_rate, color = vacc_status)) +
geom_line(size = 0.5, alpha = 0.4) +
stat_smooth(method = "loess", se = FALSE, size = 1.2) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Měsíční multipředpisovost na 1000 předepsaných klientů") +
labs(
title = "Měsíční multipředpisovost podle vakcinačního statusu",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum",
color = "Vakcinace"
) +
theme_minimal()Interakce vakcíny a post-covidového období nevykazuje známky statistické významnosti.
did_data <- multi_summary_clean %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(year_month >= as.Date("2022-01-01"), 1, 0),
interaction = vaccinated * post_covid
)
did_model <- lm(log(multi_rate) ~ vaccinated * post_covid, data = did_data)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(multi_rate) ~ vaccinated * post_covid, data = did_data)
Residuals:
Min 1Q Median 3Q Max
-0.94019 -0.19407 0.03713 0.26729 0.57878
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.31147 0.07575 56.920 <2e-16 ***
vaccinated 0.12265 0.10712 1.145 0.2552
post_covid 0.27837 0.10712 2.599 0.0109 *
vaccinated:post_covid -0.20807 0.15149 -1.373 0.1729
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3711 on 92 degrees of freedom
Multiple R-squared: 0.073, Adjusted R-squared: 0.04277
F-statistic: 2.415 on 3 and 92 DF, p-value: 0.07156
Na první pohled je patrný trend zvyšování poměru injekcí a infuzí na úkor tablet. Injekce také mají všeobecně vyšší míru multipředpisovosti. To může částečně vysvětlovat všeobecně vyšší multipředpisovost v post-covidovém období. Otázkou je, zdali k nárůstu injekcí / infuzí dochází u dětí a mladistvých (s rostoucím věkem) přirozeně a je tedy možné vysvětlit změnu prostým stárnutím populace a nebo se jedná o dlouhodobu změnu.
prescriptions <- prescriptions %>%
mutate(
drug_form_std = case_when(
is.na(drug_form) ~ NA_character_,
str_detect(drug_form, "tableta|Tableta") ~ "Tableta",
str_detect(drug_form, "tobolka|Tobolka") ~ "Tobolka",
str_detect(drug_form, "injekční|Injekční|infuzní|Infuzní") ~ "Injekce / Infuze",
str_detect(drug_form, "č[íi]pek|Čípek") ~ "Čípek",
str_detect(drug_form, "perorální roztok|Perorální roztok") ~ "Perorální roztok",
TRUE ~ "Other"
)
)
prescriptions <- prescriptions %>%
mutate(
year = year(presc_date)
)
ggplot(prescriptions %>% filter(!is.na(presc_date), !is.na(drug_form)), aes(x = factor(year), fill = drug_form_std)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(
title = "Poměr způsobu podání v čase",
x = "Rok",
y = "%",
fill = "Způsob podání"
) +
theme_minimal()client_yearly_counts <- prescriptions %>%
filter(!is.na(presc_date), !is.na(drug_form)) %>%
group_by(client_id, year, drug_form_std) %>%
summarise(presc_count = n(), .groups = "drop")
average_freq_per_year <- client_yearly_counts %>%
group_by(year, drug_form_std) %>%
summarise(avg_presc_per_client = mean(presc_count), .groups = "drop")
ggplot(average_freq_per_year, aes(x = factor(year), y = avg_presc_per_client, fill = drug_form_std)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Počet předpisů za rok na klienta dle způsobu podání",
x = "Rok",
y = "Průměrný počet předpisů na klienta",
fill = "Způsob podání"
) +
theme_minimal()Poměr injekcí / infuzí na tablety v zásadě potvrzuje trend zobrazený v předchozím grafu.
filtered_data <- prescriptions %>%
mutate(
drug_form_std = ifelse(is.na(drug_form_std), "Unknown", drug_form_std)
) %>%
filter(
drug_form_std %in% c("Injekce / Infuze", "Tableta"),
!is.na(vaccinated),
!is.na(presc_yearmonth)
)
form_vax_counts <- filtered_data %>%
group_by(presc_yearmonth, vaccinated, drug_form_std) %>%
summarise(n = n(), .groups = "drop")
month_totals <- filtered_data %>%
group_by(presc_yearmonth, vaccinated) %>%
summarise(total = n(), .groups = "drop")
form_vax_ratios <- form_vax_counts %>%
left_join(month_totals, by = c("presc_yearmonth", "vaccinated")) %>%
mutate(ratio = n / total)
form_vax_ratios <- form_vax_ratios %>%
mutate(group = paste(drug_form_std, ifelse(vaccinated == "Vakcinace", "Vakcinace", "Bez vakcinace"), sep = " - "))
ggplot(form_vax_ratios, aes(x = presc_yearmonth, y = ratio, color = group)) +
geom_line(size = 0.6) +
geom_smooth(se = FALSE, span = 0.3) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_x_date(date_labels = "%Y",
date_breaks = "1 year",
expand = c(0.01, 0)) +
labs(
title = "Poměr počtu předpisů podle způsobu podání a vakcinace",
x = "Datum",
y = "Poměr počtu předpisů",
color = "Forma podání + Vakcinace"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5))Bohužel je patrné, že pro relevantní zhodnocení rizika prvopředpisu
je v očkované populaci 5-11 příliš málo záznamů. Jediný
významný výsledek je v pátem období (tedy 270 - 365 dní od vakcinace).
Podle tohoto výsledku je riziko více než dvojnásobné (ovšem na velmi
širokém intervalu spolehlivosti).
combined_vaccinated_prescribed <- dbGetQuery(con,
"
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
'CPZP' as ins_company
FROM cpzp_vaccinations v
LEFT JOIN cpzp_clients c ON v.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN cpzp_prescriptions p ON c.client_id = p.client_id
AND p.presc_order = 1
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-08-04' - INTERVAL '7 days'
AND DATE '2021-08-04' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 12 AND 15
UNION
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
'OZP' AS ins_company
FROM OZP_vaccinations v
LEFT JOIN ozp_clients c ON v.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN ozp_prescriptions p ON c.client_id = p.client_id
AND p.presc_order = 1
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-08-04' - INTERVAL '7 days'
AND DATE '2021-08-04' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 12 AND 15
") %>%
mutate(client_id = row_number()) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_vaccinated <- combined_vaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_vaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_vaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_vaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 480L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 480, number of events= 80
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 0.1008 1.1061 0.6030 0.167 0.86724
exposure_start_day2 -0.3047 0.7374 0.5294 -0.575 0.56499
exposure_start_day3 0.1008 1.1061 0.3761 0.268 0.78865
exposure_start_day4 0.3885 1.4747 0.3371 1.152 0.24914
exposure_start_day5 0.7835 2.1891 0.2880 2.721 0.00652 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 1.1061 0.9041 0.3392 3.606
exposure_start_day2 0.7374 1.3562 0.2612 2.081
exposure_start_day3 1.1061 0.9041 0.5293 2.311
exposure_start_day4 1.4747 0.6781 0.7617 2.855
exposure_start_day5 2.1891 0.4568 1.2449 3.849
Concordance= 0.748 (se = 0.038 )
Likelihood ratio test= 8.52 on 5 df, p=0.1
Wald test = 9.2 on 5 df, p=0.1
Score (logrank) test = 9.64 on 5 df, p=0.09
Neočkovaná část populace vykazuje mnohem relevantnější výsledky (i z důvodu většího množství sledovaných osob).
combined_unvaccinated_prescribed <- dbGetQuery(con,
"
SELECT
p.client_id,
DATE '2021-08-04' AS vaccination_date,
p.date AS event_date,
'CPZP' as ins_company
FROM cpzp_prescriptions p
LEFT JOIN cpzp_clients c ON P.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-08-04'- INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-08-04' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 12 AND 15
AND p.date BETWEEN DATE '2021-08-04' - INTERVAL '1 year' AND DATE '2021-08-04' + INTERVAL '1 year'
AND p.presc_order = 1
AND c.n_vacc = 0
UNION
SELECT
p.client_id,
DATE '2021-08-04' AS vaccination_date,
p.date AS event_date,
'OZP' as ins_company
FROM ozp_prescriptions p
LEFT JOIN ozp_clients c ON P.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-08-04' - INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-08-04' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 12 AND 15
AND p.date BETWEEN DATE '2021-08-04' - INTERVAL '1 year' AND DATE '2021-08-04' + INTERVAL '1 year'
AND p.presc_order = 1
AND c.n_vacc = 0
") %>%
mutate(client_id = row_number()) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_unvaccinated <- combined_unvaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_unvaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_unvaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_unvaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 4566L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 4566, number of events= 761
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 0.08217 1.08564 0.19380 0.424 0.67157
exposure_start_day2 0.36553 1.44128 0.12674 2.884 0.00393 **
exposure_start_day3 0.27970 1.32274 0.11185 2.501 0.01240 *
exposure_start_day4 0.32579 1.38513 0.10994 2.963 0.00304 **
exposure_start_day5 0.27911 1.32196 0.10921 2.556 0.01060 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 1.086 0.9211 0.7425 1.587
exposure_start_day2 1.441 0.6938 1.1243 1.848
exposure_start_day3 1.323 0.7560 1.0623 1.647
exposure_start_day4 1.385 0.7220 1.1166 1.718
exposure_start_day5 1.322 0.7565 1.0672 1.637
Concordance= 0.709 (se = 0.014 )
Likelihood ratio test= 17.88 on 5 df, p=0.003
Wald test = 17.87 on 5 df, p=0.003
Score (logrank) test = 18.02 on 5 df, p=0.003
Podle naměřených hodnot je riziko u očkovaných osob mírně nižší nebo srovnatelné až 180 dní od vakcinace, poté se naopak mírně zvýší. P-hodnoty rozdílů mezi skupinami jsou však ve všech případech vysoko nad akceptovatelnou úrovní (což je i mimo výpočet patrné z širokých intervalů spolehlivosti).
coef_unvax <- sccs_result_unvaccinated$coefficients
conf_unvax <- sccs_result_unvaccinated$conf.int
df_unvax <- data.frame(
term = rownames(coef_unvax),
logHR = coef_unvax[, "coef"],
SE = coef_unvax[, "se(coef)"],
HR = coef_unvax[, "exp(coef)"],
conf.low = conf_unvax[, "lower .95"],
conf.high = conf_unvax[, "upper .95"]
)
coef_vax <- sccs_result_vaccinated$coefficients
conf_vax <- sccs_result_vaccinated$conf.int
df_vax <- data.frame(
term = rownames(coef_vax),
logHR = coef_vax[, "coef"],
SE = coef_vax[, "se(coef)"],
HR = coef_vax[, "exp(coef)"],
conf.low = conf_vax[, "lower .95"],
conf.high = conf_vax[, "upper .95"]
)
df_unvax <- df_unvax %>% mutate(group = "Bez Vakcinace")
df_vax <- df_vax %>% mutate(group = "Vakcinace")
combined <- bind_rows(df_vax, df_unvax)
combined$period <- as.integer(gsub("exposure_start_day", "", combined$term))
start_days <- c(0, 30, 90, 180, 270)
end_days <- c(29, 89, 179, 269, Inf)
period_labels <- ifelse(
is.infinite(end_days),
paste0(start_days, "+"),
paste0(start_days, "-", end_days)
)
combined$period_label <- factor(combined$period,
levels = seq_along(period_labels),
labels = period_labels)
ggplot(combined, aes(x = period_label, y = HR, color = group)) +
geom_point(position = position_dodge(width = 0.6), size = 3) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2,
position = position_dodge(width = 0.6)) +
geom_hline(yintercept = 1, linetype = "dashed") +
labs(
x = "Období expozice (dny)",
y = "Poměr rizik (HR)",
title = "Porovnání SCCS: očkovaní vs neočkovaní"
) +
theme_minimal()vax_df <- combined %>%
filter(group == "Vakcinace") %>%
rename_with(~ paste0(., "_vax"), -c(period, period_label))
unvax_df <- combined %>%
filter(group == "Bez Vakcinace") %>%
rename_with(~ paste0(., "_unvax"), -c(period, period_label))
df_diff <- inner_join(vax_df, unvax_df, by = c("period", "period_label"))
df_diff <- df_diff %>%
mutate(
logHR_diff = logHR_vax - logHR_unvax,
se_diff = sqrt(SE_vax^2 + SE_unvax^2),
ci_low = logHR_diff - 1.96 * se_diff,
ci_high = logHR_diff + 1.96 * se_diff
)
ggplot(df_diff, aes(x = period_label, y = logHR_diff)) +
geom_point(color = "blue", size = 3) +
geom_errorbar(aes(ymin = ci_low, ymax = ci_high), width = 0.2, color = "blue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(
title = "Rozdíl v log(HR) (Očkovaní - Neočkovaní)",
x = "Období expozice",
y = "Rozdíl v log(HR)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))common_terms <- intersect(df_vax$term, df_unvax$term)
did_results <- data.frame(
term = character(),
period_label = character(),
logHR_vax = numeric(),
logHR_unvax = numeric(),
DiD_logHR = numeric(),
DiD_SE = numeric(),
Z = numeric(),
P_value = numeric(),
stringsAsFactors = FALSE
)
for (term in common_terms) {
logHR_vax <- df_vax %>% filter(term == !!term) %>% pull(logHR)
SE_vax <- df_vax %>% filter(term == !!term) %>% pull(SE)
logHR_unvax <- df_unvax %>% filter(term == !!term) %>% pull(logHR)
SE_unvax <- df_unvax %>% filter(term == !!term) %>% pull(SE)
DiD_logHR <- logHR_vax - logHR_unvax
DiD_SE <- sqrt(SE_vax^2 + SE_unvax^2)
Z <- DiD_logHR / DiD_SE
P_value <- 2 * pnorm(-abs(Z))
period_idx <- as.integer(gsub("exposure_start_day", "", term))
period_label <- period_labels[period_idx]
did_results <- rbind(did_results, data.frame(
term = term,
period_label = period_label,
logHR_vax = logHR_vax,
logHR_unvax = logHR_unvax,
DiD_logHR = DiD_logHR,
DiD_SE = DiD_SE,
Z = Z,
P_value = P_value
))
}
did_results %>%
transmute(Období = period_label, `log(HR) Vakcinace` = logHR_vax, `log(HR) Bez vakcinace` = logHR_unvax, `DiD log(HR)` = DiD_logHR, p_value = P_value) %>%
kable(format = "html", caption = "Porovnání rizik očkovaných a neočkovaných (SCCS)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Období | log(HR) Vakcinace | log(HR) Bez vakcinace | DiD log(HR) | p_value |
|---|---|---|---|---|
| 0-29 | 0.1008047 | 0.0821706 | 0.0186341 | 0.9765303 |
| 30-89 | -0.3046604 | 0.3655330 | -0.6701934 | 0.2182940 |
| 90-179 | 0.1008047 | 0.2797016 | -0.1788969 | 0.6484026 |
| 180-269 | 0.3884868 | 0.3257927 | 0.0626941 | 0.8596533 |
| 270+ | 0.7834806 | 0.2791118 | 0.5043688 | 0.1015081 |
U celkového počtu předpisů je výhoda ve větším množství záznamů a tedy i vyšší pravděpodobnosti, že některý z výsledků bude významný. To se bohužel u očkované skupiny nepotvrdilo a jediný signifikantní výsledek je opět v období 270 - 365 dní po vakcinaci.
combined_vaccinated_prescribed <- dbGetQuery(con,
"
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
d.drug_form,
'CPZP' as ins_company
FROM cpzp_vaccinations v
LEFT JOIN cpzp_clients c ON v.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN cpzp_prescriptions p ON c.client_id = p.client_id
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
LEFT JOIN cpzp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-08-04' - INTERVAL '7 days'
AND DATE '2021-08-04' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 12 AND 15
UNION
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
d.drug_form,
'OZP' AS ins_company
FROM OZP_vaccinations v
LEFT JOIN ozp_clients c ON v.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN ozp_prescriptions p ON c.client_id = p.client_id
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
LEFT JOIN ozp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-08-04' - INTERVAL '7 days'
AND DATE '2021-08-04' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 12 AND 15
") %>%
mutate(client_id = as.numeric(factor(client_id)),
drug_form = tolower(drug_form),
inj_or_inf = str_detect(drug_form, "injekční|infuzní"))
injectable_infusional <- combined_vaccinated_prescribed %>%
filter(inj_or_inf) %>%
arrange(client_id, event_date) %>%
group_by(client_id) %>%
mutate(
day_diff = as.numeric(difftime(event_date, lag(event_date, default = first(event_date)), units = "days")),
new_window = is.na(day_diff) | day_diff > 10,
window_id = cumsum(new_window)
) %>%
group_by(client_id, window_id) %>%
slice(1) %>%
ungroup()
other_drugs <- combined_vaccinated_prescribed %>%
filter(!inj_or_inf)
combined_vaccinated_prescribed <- bind_rows(injectable_infusional, other_drugs) %>%
arrange(client_id, event_date) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_vaccinated <- combined_vaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_vaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_vaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_vaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 1122L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 1122, number of events= 187
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 -0.14036 0.86905 0.42258 -0.332 0.73978
exposure_start_day2 0.01379 1.01389 0.28868 0.048 0.96189
exposure_start_day3 -0.08629 0.91733 0.25404 -0.340 0.73410
exposure_start_day4 0.18815 1.20701 0.22783 0.826 0.40890
exposure_start_day5 0.56829 1.76525 0.19377 2.933 0.00336 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 0.8690 1.1507 0.3796 1.989
exposure_start_day2 1.0139 0.9863 0.5758 1.785
exposure_start_day3 0.9173 1.0901 0.5576 1.509
exposure_start_day4 1.2070 0.8285 0.7723 1.886
exposure_start_day5 1.7653 0.5665 1.2075 2.581
Concordance= 0.752 (se = 0.026 )
Likelihood ratio test= 9.6 on 5 df, p=0.09
Wald test = 10.51 on 5 df, p=0.06
Score (logrank) test = 10.79 on 5 df, p=0.06
Naopak u neočkované populace je díky výrazně většímu množství záznamů možné zhodnotit rizika s větší jistotou. Výsledek v zásadě říká, že po celou dobu od rozhodného data (vyjma prvního měsíce) je riziko zhruba jeden a půl násobné.
combined_unvaccinated_prescribed <- dbGetQuery(con,
"
SELECT
p.client_id,
DATE '2021-08-04' AS vaccination_date,
p.date AS event_date,
d.drug_form,
'CPZP' as ins_company
FROM cpzp_prescriptions p
LEFT JOIN cpzp_clients c ON P.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
LEFT JOIN cpzp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-08-04' - INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-08-04' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 12 AND 15
AND p.date BETWEEN DATE '2021-08-04' - INTERVAL '1 year' AND DATE '2021-08-04' + INTERVAL '1 year'
AND c.n_vacc = 0
UNION
SELECT
p.client_id,
DATE '2021-08-04' AS vaccination_date,
p.date AS event_date,
d.drug_form,
'OZP' as ins_company
FROM ozp_prescriptions p
LEFT JOIN ozp_clients c ON P.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
LEFT JOIN ozp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-08-04' - INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-08-04' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 12 AND 15
AND p.date BETWEEN DATE '2021-08-04' - INTERVAL '1 year' AND DATE '2021-08-04' + INTERVAL '1 year'
AND c.n_vacc = 0
") %>%
mutate(client_id = as.numeric(factor(client_id)),
drug_form = tolower(drug_form),
inj_or_inf = str_detect(drug_form, "injekční|infuzní"))
injectable_infusional <- combined_unvaccinated_prescribed %>%
filter(inj_or_inf) %>%
arrange(client_id, event_date) %>%
group_by(client_id) %>%
mutate(
day_diff = as.numeric(difftime(event_date, lag(event_date, default = first(event_date)), units = "days")),
new_window = is.na(day_diff) | day_diff > 10,
window_id = cumsum(new_window)
) %>%
group_by(client_id, window_id) %>%
slice(1) %>%
ungroup()
other_drugs <- combined_unvaccinated_prescribed %>%
filter(!inj_or_inf)
combined_unvaccinated_prescribed <- bind_rows(injectable_infusional, other_drugs) %>%
arrange(client_id, event_date) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_unvaccinated <- combined_unvaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_unvaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_unvaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_unvaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 10308L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 10308, number of events= 1718
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 0.08648 1.09033 0.13151 0.658 0.511
exposure_start_day2 0.39769 1.48838 0.08507 4.675 2.94e-06 ***
exposure_start_day3 0.33367 1.39608 0.07453 4.477 7.57e-06 ***
exposure_start_day4 0.43198 1.54030 0.07189 6.009 1.87e-09 ***
exposure_start_day5 0.38231 1.46567 0.07150 5.347 8.95e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 1.090 0.9172 0.8426 1.411
exposure_start_day2 1.488 0.6719 1.2598 1.758
exposure_start_day3 1.396 0.7163 1.2063 1.616
exposure_start_day4 1.540 0.6492 1.3379 1.773
exposure_start_day5 1.466 0.6823 1.2740 1.686
Concordance= 0.705 (se = 0.009 )
Likelihood ratio test= 63.17 on 5 df, p=3e-12
Wald test = 62.83 on 5 df, p=3e-12
Score (logrank) test = 63.63 on 5 df, p=2e-12
Rozdíl mezi riziky je všeobecně vyhodnocen jako menší pro očkovanou populaci. Ani jedno z porovnání však není statisticky významné.
coef_unvax <- sccs_result_unvaccinated$coefficients
conf_unvax <- sccs_result_unvaccinated$conf.int
df_unvax <- data.frame(
term = rownames(coef_unvax),
logHR = coef_unvax[, "coef"],
SE = coef_unvax[, "se(coef)"],
HR = coef_unvax[, "exp(coef)"],
conf.low = conf_unvax[, "lower .95"],
conf.high = conf_unvax[, "upper .95"]
)
coef_vax <- sccs_result_vaccinated$coefficients
conf_vax <- sccs_result_vaccinated$conf.int
df_vax <- data.frame(
term = rownames(coef_vax),
logHR = coef_vax[, "coef"],
SE = coef_vax[, "se(coef)"],
HR = coef_vax[, "exp(coef)"],
conf.low = conf_vax[, "lower .95"],
conf.high = conf_vax[, "upper .95"]
)
df_unvax <- df_unvax %>% mutate(group = "Unvaccinated")
df_vax <- df_vax %>% mutate(group = "Vaccinated")
combined <- bind_rows(df_vax, df_unvax)
combined$period <- as.integer(gsub("exposure_start_day", "", combined$term))
start_days <- c(0, 30, 90, 180, 270)
end_days <- c(29, 89, 179, 269, Inf)
period_labels <- ifelse(
is.infinite(end_days),
paste0(start_days, "+"),
paste0(start_days, "-", end_days)
)
combined$period_label <- factor(combined$period,
levels = seq_along(period_labels),
labels = period_labels)
ggplot(combined, aes(x = period_label, y = HR, color = group)) +
geom_point(position = position_dodge(width = 0.6), size = 3) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2,
position = position_dodge(width = 0.6)) +
geom_hline(yintercept = 1, linetype = "dashed") +
labs(
x = "Období expozice (dny)",
y = "Poměr rizik (HR)",
title = "Porovnání SCCS: očkovaní vs neočkovaní"
) +
theme_minimal()vax_df <- combined %>%
filter(group == "Vaccinated") %>%
rename_with(~ paste0(., "_vax"), -c(period, period_label))
unvax_df <- combined %>%
filter(group == "Unvaccinated") %>%
rename_with(~ paste0(., "_unvax"), -c(period, period_label))
df_diff <- inner_join(vax_df, unvax_df, by = c("period", "period_label"))
df_diff <- df_diff %>%
mutate(
logHR_diff = logHR_vax - logHR_unvax,
se_diff = sqrt(SE_vax^2 + SE_unvax^2),
ci_low = logHR_diff - 1.96 * se_diff,
ci_high = logHR_diff + 1.96 * se_diff
)
ggplot(df_diff, aes(x = period_label, y = logHR_diff)) +
geom_point(color = "blue", size = 3) +
geom_errorbar(aes(ymin = ci_low, ymax = ci_high), width = 0.2, color = "blue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(
title = "Rozdíl v log(HR) (Očkovaní - Neočkovaní)",
x = "Období expozice",
y = "Rozdíl v log(HR)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))common_terms <- intersect(df_vax$term, df_unvax$term)
did_results <- data.frame(
term = character(),
period_label = character(),
logHR_vax = numeric(),
logHR_unvax = numeric(),
DiD_logHR = numeric(),
DiD_SE = numeric(),
Z = numeric(),
P_value = numeric(),
stringsAsFactors = FALSE
)
for (term in common_terms) {
logHR_vax <- df_vax %>% filter(term == !!term) %>% pull(logHR)
SE_vax <- df_vax %>% filter(term == !!term) %>% pull(SE)
logHR_unvax <- df_unvax %>% filter(term == !!term) %>% pull(logHR)
SE_unvax <- df_unvax %>% filter(term == !!term) %>% pull(SE)
DiD_logHR <- logHR_vax - logHR_unvax
DiD_SE <- sqrt(SE_vax^2 + SE_unvax^2)
Z <- DiD_logHR / DiD_SE
P_value <- 2 * pnorm(-abs(Z))
period_idx <- as.integer(gsub("exposure_start_day", "", term))
period_label <- period_labels[period_idx]
did_results <- rbind(did_results, data.frame(
term = term,
period_label = period_label,
logHR_vax = logHR_vax,
logHR_unvax = logHR_unvax,
DiD_logHR = DiD_logHR,
DiD_SE = DiD_SE,
Z = Z,
P_value = P_value
))
}
did_results %>%
transmute(Období = period_label, `log(HR) Vakcinace` = logHR_vax, `log(HR) Bez vakcinace` = logHR_unvax, `DiD log(HR)` = DiD_logHR, p_value = P_value) %>%
kable(format = "html", caption = "Porovnání rizik (SCCS) očkovaných a neočkovaných") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Období | log(HR) Vakcinace | log(HR) Bez vakcinace | DiD log(HR) | p_value |
|---|---|---|---|---|
| 0-29 | -0.1403574 | 0.0864778 | -0.2268352 | 0.6082714 |
| 30-89 | 0.0137933 | 0.3976904 | -0.3838971 | 0.2020880 |
| 90-179 | -0.0862901 | 0.3336685 | -0.4199587 | 0.1126796 |
| 180-269 | 0.1881467 | 0.4319795 | -0.2438327 | 0.3074186 |
| 270+ | 0.5682940 | 0.3823111 | 0.1859829 | 0.3678679 |
Z dlouhodobého hlediska je nárůst ve všech sledovaných kategoriích naprosto zřejmý. Rozdíl mezi očkovanou a neočkovanou populací však není možné prokázat ani v jednom případě. Změny ve způsobu podání (vyšší podíl injekcí a infuzí na úkor tablet) mohou částečně vysvětlit zvýšenou multipředpisovost, je však otázkou zdali se jedná o přirozený jev způsobený stárnutím populace a nebo je vliv jiný.
Co se týče vyhodnocení rizika v krátkodobém horizontu, trpí dataset na nedostatek relevantních dat. Ve sledovaných obdobích nebyl nalezen jediný statisticky významný rozdíl mezi očkovanou a neočkovanou populací.
I v této věkové kategorii jsou patrné rozdíly, výrazný rozdíl způsobený odlišnými riziky na základě věku by však ani zde neměl nastat.
unique_clients <- prescriptions %>%
filter(!is.na(age_2021), !is.na(sex), !is.na(vaccinated)) %>%
distinct(client_id, .keep_all = TRUE)
age_data <- unique_clients %>%
group_by(vaccinated, age_2021) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(vaccinated) %>%
mutate(prop = count / sum(count),
variable = "Age",
value = as.character(age_2021)) %>%
ungroup()
sex_data <- unique_clients %>%
group_by(vaccinated, sex) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(vaccinated) %>%
mutate(prop = count / sum(count),
variable = "Sex",
value = sex) %>%
ungroup()ggplot(unique_clients, aes(x = age_2021, fill = factor(vaccinated))) +
geom_histogram(aes(y = after_stat(density)), position = "identity", alpha = 0.4, binwidth = 1) +
scale_fill_brewer(palette = "Set1", name = "Vaccinated") +
scale_x_continuous(breaks = seq(floor(min(unique_clients$age)), ceiling(max(unique_clients$age)), by = 1)) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Věkové rozložení podle statusu vakcinace",
x = "Věk", y = "%",
fill = "Vakcinace"
) +
theme_minimal()ggplot(sex_data, aes(x = sex, y = prop, color = vaccinated, group = vaccinated)) +
geom_line(size = 1) +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Pohlaví podle vakcinace",
x = "Pohlaví",
y = "%",
color = "Vakcinace"
) +
theme_minimal()prescriptions_clean <- prescriptions %>%
filter(!is.na(presc_date)) %>%
mutate(presc_date = as.Date(presc_date)) %>%
arrange(presc_date) %>%
filter(presc_date >= as.Date("2015-01-01") & presc_date <= as.Date("2023-12-31")) %>%
mutate(count = 1)
results_vacc <- prescriptions_clean %>%
filter(!is.na(vaccinated)) %>%
group_by(vaccinated) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$vaccinated)
analyze_group(df = filter(prescriptions_clean, vaccinated == group_val),
group_id = group_val,
group_type = "Vaccination")
}) %>%
compact() %>%
bind_rows()
results_sex <- prescriptions_clean %>%
filter(!is.na(sex)) %>%
group_by(sex) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$sex)
analyze_group(df = filter(prescriptions_clean, sex == group_val),
group_id = group_val,
group_type = "Sex")
}) %>%
compact() %>%
bind_rows()
results_all <- bind_rows(results_vacc, results_sex) %>%
arrange(p_value)
plot_data_all <- results_all %>%
unnest(plot_data, names_sep = "_")Výkyvy od předpokládaného trendu jsou zde ve všech skupinách. Vyjma vakcinované kategorie jsou také všechny statisticky významné.
results_all %>%
mutate(Skupina = group, RMSE = rmse, rRMSE = rrmse) %>%
select(
Skupina, RMSE, rRMSE, t_stat, p_value
) %>%
kable(format = "html", caption = "Porovnání předpisovosti v čase (ETS)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Skupina | RMSE | rRMSE | t_stat | p_value |
|---|---|---|---|---|
| F | 140.34671 | 0.2687347 | 12.2492343 | 0.0000000 |
| M | 115.36129 | 0.2724855 | -6.5066174 | 0.0000000 |
| Bez vakcinace | 40.72322 | 0.1418763 | 2.7188140 | 0.0091492 |
| Vakcinace | 96.13567 | 0.1459734 | -0.1575997 | 0.8754474 |
ggplot(plot_data_all, aes(x = plot_data_month)) +
geom_line(aes(y = plot_data_prescriptions), color = "blue") +
geom_line(aes(y = plot_data_predicted), color = "red", linetype = "dashed") +
geom_vline(xintercept = as.Date("2020-01-01"), linetype = "dotted") +
facet_wrap(~ paste(plot_data_group_type, plot_data_group, sep = ": "), scales = "free_y") +
labs(
title = "Realita vs Předpověď (ETS)",
subtitle = "Modrá = Realita, Červená = Předpověď",
x = "Datum", y = "Počet předpisů"
) +
theme_minimal()first_presc <- dbGetQuery(con,
"
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
'CPZP' AS ins_company
FROM
cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN cpzp_prescriptions p ON c.client_id = p.client_id AND p.presc_order = 1
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date = '9999-12-31'
AND 2021 - c.birthyear BETWEEN 16 AND 29
UNION
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
'OZP' AS ins_company
FROM
ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN ozp_prescriptions p ON c.client_id = p.client_id AND p.presc_order = 1
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date IS NULL
AND 2021 - c.birthyear BETWEEN 16 AND 29
") %>%
mutate(first_presc_yearmonth = floor_date(first_presc_date, unit = "month"),
sex = ifelse(sex == "Z", "F", sex),
vacc_status = ifelse(n_vacc > 0, "Vakcinace", "Bez vakcinace"),
age_2021 = 2021 - birthyear)
monthly_first_presc <- first_presc %>%
count(first_presc_yearmonth, name = "new_prescriptions") %>%
arrange(first_presc_yearmonth) %>%
mutate(
cum_new_presc = cumsum(new_prescriptions),
total_population = nrow(first_presc),
at_risk_population = total_population - lag(cum_new_presc, default = 0),
rate = new_prescriptions / at_risk_population
) %>%
mutate(
period_group = case_when(
first_presc_yearmonth < as.Date("2020-01-01") ~ "Pre-COVID",
first_presc_yearmonth > as.Date("2021-12-31") ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
monthly_clean <- monthly_first_presc %>%
filter(first_presc_yearmonth >= as.Date("2018-01-01")) %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
ttest <- t.test(rate ~ period_group, data = monthly_clean)V porovnání před-covidového a po-covidového období došlo k jednoznačnému nárůstu prvopředpisovosti.
ggplot(monthly_first_presc %>% filter(first_presc_yearmonth >= as.Date("2018-01-01")),
aes(x = first_presc_yearmonth, y = rate * 1000)) +
geom_line(color = "steelblue", size = 1) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Počet prvopředpisů na 1000 osob bez předpisu", limits = c(0, 3)) +
labs(
title = "Prvopředpisovost v čase",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum"
) +
theme_minimal()
Welch Two Sample t-test
data: rate by period_group
t = 5.6304, df = 45.941, p-value = 1.037e-06
alternative hypothesis: true difference in means between group Post-COVID and group Pre-COVID is not equal to 0
95 percent confidence interval:
0.0002250338 0.0004754821
sample estimates:
mean in group Post-COVID mean in group Pre-COVID
0.002210366 0.001860108
total_by_vacc <- first_presc %>%
count(vacc_status, name = "total_population")
monthly_by_vacc <- first_presc %>%
count(vacc_status, first_presc_yearmonth, name = "new_prescriptions") %>%
arrange(vacc_status, first_presc_yearmonth) %>%
group_by(vacc_status) %>%
mutate(
cum_prescriptions = cumsum(new_prescriptions),
total_population = total_by_vacc$total_population[match(vacc_status, total_by_vacc$vacc_status)],
at_risk_population = total_population - lag(cum_prescriptions, default = 0),
rate = new_prescriptions / at_risk_population
) %>%
ungroup() %>%
mutate(
period_group = case_when(
first_presc_yearmonth < as.Date("2020-01-01") ~ "Pre-COVID",
first_presc_yearmonth > as.Date("2021-12-31") ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
monthly_by_vacc_clean <- monthly_by_vacc %>%
filter(first_presc_yearmonth >= as.Date("2018-01-01")) %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
first_presc_table <- monthly_by_vacc_clean %>%
group_by(vacc_status) %>%
summarise(
p_value = t.test(rate ~ period_group)$p.value,
mean_before = mean(rate[period_group == "Pre-COVID"]),
mean_after = mean(rate[period_group == "Post-COVID"]),
rate = mean_after / mean_before
)Rozdíly mezi skupinami jsou však nepatrné až téměř nulové. To také potvrzuje p-hodnota příslušného DiD modelu.
first_presc_table %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before * 1000, `Průměr po` = mean_after * 1000, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání prvopředpisovosti v čase podle vakcinace") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.0001376 | 1.707657 | 2.032872 | 1.190445 |
| Vakcinace | 0.0000011 | 1.934736 | 2.298202 | 1.187863 |
ggplot(monthly_by_vacc %>% filter(first_presc_yearmonth >= as.Date("2018-01-01")),
aes(x = first_presc_yearmonth, y = rate * 1000, color = vacc_status)) +
geom_line(size = 1) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Počet prvopředpisů na 1000 osob bez předpisu", limits = c(0, 3)) +
labs(
title = "Prvopředpisovost v čase podle vakcinace",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum",
color = "Vakcinace"
) +
theme_minimal()monthly_rates <- monthly_by_vacc %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID")) %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(first_presc_yearmonth >= as.Date("2022-01-01"), 1, 0),
interaction = vaccinated * post_covid)
did_model <- lm(log(rate * 1000) ~ vaccinated * post_covid, data = monthly_rates)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(rate * 1000) ~ vaccinated * post_covid, data = monthly_rates)
Residuals:
Min 1Q Median 3Q Max
-0.28041 -0.09699 -0.01711 0.09465 0.48581
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.54788 0.01810 30.266 < 2e-16 ***
vaccinated 0.10123 0.02560 3.954 0.000114 ***
post_covid 0.15113 0.03387 4.463 1.5e-05 ***
vaccinated:post_covid 0.02799 0.04789 0.585 0.559682
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1402 on 164 degrees of freedom
Multiple R-squared: 0.3091, Adjusted R-squared: 0.2965
F-statistic: 24.46 on 3 and 164 DF, p-value: 3.923e-13
presc_id_counts <- dbGetQuery(con,
"
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
2021 - c.birthyear AS age_2021,
p.date as presc_date,
p.presc_order,
'CPZP' AS ins_company,
YEAR(FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.presc_order ASC)) AS first_presc_year
FROM
cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN cpzp_prescriptions p ON c.client_id = p.client_id
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date = '9999-12-31'
AND 2021 - c.birthyear BETWEEN 16 AND 29
UNION
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
2021 - c.birthyear AS age_2021,
p.date as presc_date,
p.presc_order,
'OZP' AS ins_company,
YEAR(FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.presc_order ASC)) AS first_presc_year
FROM
ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN ozp_prescriptions p ON c.client_id = p.client_id
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date IS NULL
AND 2021 - c.birthyear BETWEEN 16 AND 29
")%>%
mutate(presc_year = as.integer(format(as.Date(presc_date), "%Y")),
vacc_status = ifelse(n_vacc > 0, "Vakcinace", "Bez vakcinace"))
monthly_unique_clients <- presc_id_counts %>%
mutate(yearmonth = floor_date(presc_date, unit = "month")) %>%
distinct(client_id, yearmonth, vacc_status) %>%
count(yearmonth, vacc_status, name = "n_unique_clients")
presc_years <- presc_id_counts %>%
mutate(
presc_year = year(presc_date)
) %>%
distinct(client_id, first_presc_year, presc_year, vacc_status) %>%
filter(first_presc_year >= 2017, first_presc_year <= 2023)
retention_data <- presc_years %>%
mutate(year_offset = presc_year - first_presc_year) %>%
filter(year_offset >= 0)
retention_summary <- retention_data %>%
group_by(first_presc_year, year_offset, vacc_status) %>%
summarise(n_clients = n_distinct(client_id), .groups = "drop")
cohort_sizes <- retention_data %>%
filter(year_offset == 0) %>%
group_by(first_presc_year, vacc_status) %>%
summarise(cohort_size = n_distinct(client_id), .groups = "drop")
retention_summary <- retention_summary %>%
left_join(cohort_sizes, by = c("first_presc_year", "vacc_status")) %>%
mutate(retention_rate = n_clients / cohort_size)
retention_summary_filtered <- retention_summary %>%
filter(year_offset > 0)Retence je všeobecně vyšší než u mladší populace a u očkované skupiny je zřejmá vyšší míra udržení “do dalších let”. Reálné zhodnocení vlivu covidu však bude možné až v budoucnosti.
ggplot(retention_summary_filtered, aes(x = year_offset, y = retention_rate, color = as.factor(first_presc_year))) +
geom_line(size = 1.2) +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_color_viridis_d(name = "Rok prvopředpisu") +
labs(
title = "Retence (meziroční předpisovost) podle vakcinace",
x = "Počet let od prvopředpisu",
y = "Míra multipředpisovosti"
) +
facet_wrap(~ vacc_status) +
theme_minimal(base_size = 13)multi_presc_yearly <- multi_presc %>%
mutate(year = year(date)) %>%
filter(year > 2017)
multi_clients_yearly <- multi_presc_yearly %>%
group_by(year, vacc_status, client_id) %>%
summarise(presc_count = n(), .groups = "drop") %>%
mutate(is_multi = presc_count > 1)
multi_summary_yearly <- multi_clients_yearly %>%
group_by(year, vacc_status) %>%
summarise(
total_clients = n(),
multi_clients = sum(is_multi),
multi_rate = (multi_clients / total_clients) * 1000,
.groups = "drop"
)
multi_summary_yearly <- multi_summary_yearly %>%
mutate(
period_group = case_when(
year < 2020 ~ "Pre-COVID",
year > 2021 ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
multi_summary_clean <- multi_summary_yearly %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
t_test_results <- multi_summary_clean %>%
group_by(vacc_status) %>%
summarise(
p_value = t.test(multi_rate ~ period_group)$p.value,
mean_before = mean(multi_rate[period_group == "Pre-COVID"]),
mean_after = mean(multi_rate[period_group == "Post-COVID"]),
rate = mean_after / mean_before,
.groups = "drop"
)V obou skupinách došlo k nárůstu od původního trendu s rozdílem toho, že v očkované populaci došlo nejprve k poklesu během covidu. Rozdíly mezi skupinami není možné vyhodnotit jako významné.
t_test_results %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before, `Průměr po` = mean_after, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání roční multipředpisovosti v čase") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.0716040 | 271.5376 | 311.866 | 1.148518 |
| Vakcinace | 0.1980032 | 259.4338 | 288.016 | 1.110172 |
ggplot(multi_summary_yearly, aes(x = year, y = multi_rate, color = vacc_status)) +
geom_line(size = 0.6, alpha = 0.4) +
stat_smooth(method = "loess", se = FALSE, size = 1.2) +
annotate("rect",
xmin = 2020, xmax = 2021,
ymin = -Inf, ymax = Inf,
alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Roční multipředpisovost na 1000 předepsaných klientů") +
scale_x_continuous(breaks = unique(multi_summary_yearly$year)) +
labs(
title = "Roční multipředpisovost podle vakcinace",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Rok",
color = "Vakcinace"
) +
theme_minimal()did_data <- multi_summary_clean %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(year >= 2022, 1, 0),
interaction = vaccinated * post_covid
)
did_model <- lm(log(multi_rate) ~ vaccinated * post_covid, data = did_data)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(multi_rate) ~ vaccinated * post_covid, data = did_data)
Residuals:
1 2 3 4 5 6 7 8
-0.02632 -0.04354 0.02632 0.04354 -0.01038 -0.01856 0.01038 0.01856
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.60375 0.02757 203.247 3.52e-09 ***
vaccinated -0.04620 0.03899 -1.185 0.3016
post_covid 0.13877 0.03899 3.559 0.0236 *
vaccinated:post_covid -0.03348 0.05514 -0.607 0.5766
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.03899 on 4 degrees of freedom
Multiple R-squared: 0.8629, Adjusted R-squared: 0.76
F-statistic: 8.389 on 3 and 4 DF, p-value: 0.03361
Změny v měsíční předpisovosti v zásadě jen potvrzují předchozí sledování s tím rozdílem, že u očkované skupiny došlo jen k mírnému růstu. Ovšem v tomto případě je rozdíl mezi rozdíly statisticky signifikantní (interakce vaccinated:post_covid).
t_test_results %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before, `Průměr po` = mean_after, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání měsíční multipředpisovosti v čase") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.0040403 | 149.3460 | 168.9455 | 1.131235 |
| Vakcinace | 0.7959155 | 147.3262 | 148.4100 | 1.007356 |
ggplot(multi_summary_monthly, aes(x = year_month, y = multi_rate, color = vacc_status)) +
geom_line(size = 0.5, alpha = 0.4) +
stat_smooth(method = "loess", se = FALSE, size = 1.2) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Měsíční multipředpisovost na 1000 předepsaných klientů") +
labs(
title = "Měsíční multipředpisovost podle vakcinačního statusu",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum",
color = "Vakcinace"
) +
theme_minimal()did_data <- multi_summary_clean %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(year_month >= as.Date("2022-01-01"), 1, 0),
interaction = vaccinated * post_covid
)
did_model <- lm(log(multi_rate) ~ vaccinated * post_covid, data = did_data)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(multi_rate) ~ vaccinated * post_covid, data = did_data)
Residuals:
Min 1Q Median 3Q Max
-0.266633 -0.081329 0.002076 0.091950 0.284863
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.994406 0.024953 200.152 < 2e-16 ***
vaccinated -0.007215 0.035289 -0.204 0.838443
post_covid 0.127420 0.035289 3.611 0.000497 ***
vaccinated:post_covid -0.118159 0.049906 -2.368 0.019997 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1222 on 92 degrees of freedom
Multiple R-squared: 0.1798, Adjusted R-squared: 0.153
F-statistic: 6.722 on 3 and 92 DF, p-value: 0.0003761
Vývoj zde jen potvrzuje trend nastolený v předchozí sledované skupině a tedy, že s rostoucím věkem se preferují injekce / infuze na úkor tablet.
prescriptions <- prescriptions %>%
mutate(
drug_form_std = case_when(
is.na(drug_form) ~ NA_character_,
str_detect(drug_form, "tableta|Tableta") ~ "Tableta",
str_detect(drug_form, "tobolka|Tobolka") ~ "Tobolka",
str_detect(drug_form, "injekční|Injekční|infuzní|Infuzní") ~ "Injekce / Infuze",
str_detect(drug_form, "č[íi]pek|Čípek") ~ "Čípek",
str_detect(drug_form, "perorální roztok|Perorální roztok") ~ "Perorální roztok",
TRUE ~ "Other"
)
)
prescriptions <- prescriptions %>%
mutate(
year = year(presc_date)
)
ggplot(prescriptions %>% filter(!is.na(presc_date), !is.na(drug_form)), aes(x = factor(year), fill = drug_form_std)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(
title = "Poměr způsobu podání v čase",
x = "Rok",
y = "%",
fill = "Způsob podání"
) +
theme_minimal()client_yearly_counts <- prescriptions %>%
filter(!is.na(presc_date), !is.na(drug_form)) %>%
group_by(client_id, year, drug_form_std) %>%
summarise(presc_count = n(), .groups = "drop")
average_freq_per_year <- client_yearly_counts %>%
group_by(year, drug_form_std) %>%
summarise(avg_presc_per_client = mean(presc_count), .groups = "drop")
ggplot(average_freq_per_year, aes(x = factor(year), y = avg_presc_per_client, fill = drug_form_std)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Počet předpisů za rok na klienta dle způsobu podání",
x = "Rok",
y = "Průměrný počet předpisů na klienta",
fill = "Způsob podání"
) +
theme_minimal()Při sledování poměru předpisů je zajímavé sledovat výkyv neočkované populace ve druhé polovině roku 2021. K tomuto výkyvu dochází u všech dospělých neočkovaných skupin a je způsoben jak snížením celkového množství injekcí, tak zvýšením celkového množství tablet. Může se jednat o vlivy covidových nařízení a částečné nepřístupnosti ambulantní léčby pro neočkovanou populaci.
filtered_data <- prescriptions %>%
mutate(
drug_form_std = ifelse(is.na(drug_form_std), "Unknown", drug_form_std)
) %>%
filter(
drug_form_std %in% c("Injekce / Infuze", "Tableta"),
!is.na(vaccinated),
!is.na(presc_yearmonth)
)
form_vax_counts <- filtered_data %>%
group_by(presc_yearmonth, vaccinated, drug_form_std) %>%
summarise(n = n(), .groups = "drop")
month_totals <- filtered_data %>%
group_by(presc_yearmonth, vaccinated) %>%
summarise(total = n(), .groups = "drop")
form_vax_ratios <- form_vax_counts %>%
left_join(month_totals, by = c("presc_yearmonth", "vaccinated")) %>%
mutate(ratio = n / total)
form_vax_ratios <- form_vax_ratios %>%
mutate(group = paste(drug_form_std, ifelse(vaccinated == "Vakcinace", "Vakcinace", "Bez vakcinace"), sep = " - "))
ggplot(form_vax_ratios, aes(x = presc_yearmonth, y = ratio, color = group)) +
geom_line(size = 0.6) +
geom_smooth(se = FALSE, span = 0.3) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_x_date(date_labels = "%Y",
date_breaks = "1 year",
expand = c(0.01, 0)) +
labs(
title = "Poměr počtu předpisů podle způsobu podání a vakcinace",
x = "Datum",
y = "Poměr počtu předpisů",
color = "Forma podání + Vakcinace"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5))V rámci dlouhodobého vývoje jsou opět všechna rizika větší v době po covidu oproti době před covidem. Průběh očkovaných a neočkovaných je však srovnatelný. Jedinou výjimkou je měsíční multipředpisovost, kde očkovaní podléhají mírně nižšímu riziku. Poměry způsobu podání v zásadě kopírují trend nastavený v mladší kategorii - a tedy nárůst injekcí / infuzí na úkor tablet s jedinou výjimkou ve druhé polovině roku 2021, kde výrazně rostou tablety a klesají injekce / infuze.
Krátkodobá analýza je sloučená se skupinou 30-39 z důvodu nízké relevance výsledků.
Demografické rozložení je v tomto případě mírně vyváženější než u mladší populace.
unique_clients <- prescriptions %>%
filter(!is.na(age_2021), !is.na(sex), !is.na(vaccinated)) %>%
distinct(client_id, .keep_all = TRUE)
age_data <- unique_clients %>%
group_by(vaccinated, age_2021) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(vaccinated) %>%
mutate(prop = count / sum(count),
variable = "Age",
value = as.character(age_2021)) %>%
ungroup()
sex_data <- unique_clients %>%
group_by(vaccinated, sex) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(vaccinated) %>%
mutate(prop = count / sum(count),
variable = "Sex",
value = sex) %>%
ungroup()ggplot(unique_clients, aes(x = age_2021, fill = factor(vaccinated))) +
geom_histogram(aes(y = after_stat(density)), position = "identity", alpha = 0.4, binwidth = 1) +
scale_fill_brewer(palette = "Set1", name = "Vaccinated") +
scale_x_continuous(breaks = seq(floor(min(unique_clients$age)), ceiling(max(unique_clients$age)), by = 1)) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Věkové rozložení podle statusu vakcinace",
x = "Věk", y = "%",
fill = "Vakcinace"
) +
theme_minimal()ggplot(sex_data, aes(x = sex, y = prop, color = vaccinated, group = vaccinated)) +
geom_line(size = 1) +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Pohlaví podle vakcinace",
x = "Pohlaví",
y = "%",
color = "Vakcinace"
) +
theme_minimal()prescriptions_clean <- prescriptions %>%
filter(!is.na(presc_date)) %>%
mutate(presc_date = as.Date(presc_date)) %>%
arrange(presc_date) %>%
filter(presc_date >= as.Date("2015-01-01") & presc_date <= as.Date("2023-12-31")) %>%
mutate(count = 1)
results_vacc <- prescriptions_clean %>%
filter(!is.na(vaccinated)) %>%
group_by(vaccinated) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$vaccinated)
analyze_group(df = filter(prescriptions_clean, vaccinated == group_val),
group_id = group_val,
group_type = "Vaccination")
}) %>%
compact() %>%
bind_rows()
results_sex <- prescriptions_clean %>%
filter(!is.na(sex)) %>%
group_by(sex) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$sex)
analyze_group(df = filter(prescriptions_clean, sex == group_val),
group_id = group_val,
group_type = "Sex")
}) %>%
compact() %>%
bind_rows()
results_all <- bind_rows(results_vacc, results_sex) %>%
arrange(p_value)
plot_data_all <- results_all %>%
unnest(plot_data, names_sep = "_")Rozdíly očekávaného úhrnu předpisů jsou napříč všemi kategoriemi téměř identické.
results_all %>%
mutate(Skupina = group, RMSE = rmse, rRMSE = rrmse) %>%
select(
Skupina, RMSE, rRMSE, t_stat, p_value
) %>%
kable(format = "html", caption = "Porovnání předpisovosti v čase (ETS)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Skupina | RMSE | rRMSE | t_stat | p_value |
|---|---|---|---|---|
| M | 78.20163 | 0.1367719 | -3.374315 | 0.0014910 |
| Bez vakcinace | 59.18631 | 0.1257455 | -2.990672 | 0.0044209 |
| F | 118.90336 | 0.1315060 | -2.603776 | 0.0123016 |
| Vakcinace | 127.09959 | 0.1264358 | -2.449810 | 0.0180735 |
ggplot(plot_data_all, aes(x = plot_data_month)) +
geom_line(aes(y = plot_data_prescriptions), color = "blue") +
geom_line(aes(y = plot_data_predicted), color = "red", linetype = "dashed") +
geom_vline(xintercept = as.Date("2020-01-01"), linetype = "dotted") +
facet_wrap(~ paste(plot_data_group_type, plot_data_group, sep = ": "), scales = "free_y") +
labs(
title = "Realita vs Předpověď (ETS)",
subtitle = "Modrá = Realita, Červená = Předpověď",
x = "Datum", y = "Počet předpisů"
) +
theme_minimal()first_presc <- dbGetQuery(con,
"
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
'CPZP' AS ins_company
FROM
cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN cpzp_prescriptions p ON c.client_id = p.client_id AND p.presc_order = 1
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date = '9999-12-31'
AND 2021 - c.birthyear BETWEEN 30 AND 39
UNION
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
'OZP' AS ins_company
FROM
ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN ozp_prescriptions p ON c.client_id = p.client_id AND p.presc_order = 1
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date IS NULL
AND 2021 - c.birthyear BETWEEN 30 AND 39
") %>%
mutate(first_presc_yearmonth = floor_date(first_presc_date, unit = "month"),
sex = ifelse(sex == "Z", "F", sex),
vacc_status = ifelse(n_vacc > 0, "Vakcinace", "Bez vakcinace"),
age_2021 = 2021 - birthyear)
monthly_first_presc <- first_presc %>%
count(first_presc_yearmonth, name = "new_prescriptions") %>%
arrange(first_presc_yearmonth) %>%
mutate(
cum_new_presc = cumsum(new_prescriptions),
total_population = nrow(first_presc),
at_risk_population = total_population - lag(cum_new_presc, default = 0),
rate = new_prescriptions / at_risk_population
) %>%
mutate(
period_group = case_when(
first_presc_yearmonth < as.Date("2020-01-01") ~ "Pre-COVID",
first_presc_yearmonth > as.Date("2021-12-31") ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
monthly_clean <- monthly_first_presc %>%
filter(first_presc_yearmonth >= as.Date("2018-01-01")) %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
ttest <- t.test(rate ~ period_group, data = monthly_clean)Stejně jako u jiných věkových skupin zde došlo k významnému nárůstu prvopředpisovosti.
ggplot(monthly_first_presc %>% filter(first_presc_yearmonth >= as.Date("2018-01-01")),
aes(x = first_presc_yearmonth, y = rate * 1000)) +
geom_line(color = "steelblue", size = 1) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Počet prvopředpisů na 1000 osob bez předpisu", limits = c(0, 5)) +
labs(
title = "Prvopředpisovost v čase",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum"
) +
theme_minimal()
Welch Two Sample t-test
data: rate by period_group
t = 4.3496, df = 45.993, p-value = 7.499e-05
alternative hypothesis: true difference in means between group Post-COVID and group Pre-COVID is not equal to 0
95 percent confidence interval:
0.0001565338 0.0004262142
sample estimates:
mean in group Post-COVID mean in group Pre-COVID
0.003192046 0.002900672
total_by_vacc <- first_presc %>%
count(vacc_status, name = "total_population")
monthly_by_vacc <- first_presc %>%
count(vacc_status, first_presc_yearmonth, name = "new_prescriptions") %>%
arrange(vacc_status, first_presc_yearmonth) %>%
group_by(vacc_status) %>%
mutate(
cum_prescriptions = cumsum(new_prescriptions),
total_population = total_by_vacc$total_population[match(vacc_status, total_by_vacc$vacc_status)],
at_risk_population = total_population - lag(cum_prescriptions, default = 0),
rate = new_prescriptions / at_risk_population
) %>%
ungroup() %>%
mutate(
period_group = case_when(
first_presc_yearmonth < as.Date("2020-01-01") ~ "Pre-COVID",
first_presc_yearmonth > as.Date("2021-12-31") ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
monthly_by_vacc_clean <- monthly_by_vacc %>%
filter(first_presc_yearmonth >= as.Date("2018-01-01")) %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
first_presc_table <- monthly_by_vacc_clean %>%
group_by(vacc_status) %>%
summarise(
p_value = t.test(rate ~ period_group)$p.value,
mean_before = mean(rate[period_group == "Pre-COVID"]),
mean_after = mean(rate[period_group == "Post-COVID"]),
rate = mean_after / mean_before
)K tomuto nárůstu došlo napříč vakcinovanou i nevakcinovanou populací a rozdíly mezi populacemi nejsou významné.
first_presc_table %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before * 1000, `Průměr po` = mean_after * 1000, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání prvopředpisovosti v čase") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.0197657 | 2.625809 | 2.816226 | 1.072518 |
| Vakcinace | 0.0000141 | 3.055007 | 3.407820 | 1.115487 |
ggplot(monthly_by_vacc %>% filter(first_presc_yearmonth >= as.Date("2018-01-01")),
aes(x = first_presc_yearmonth, y = rate * 1000, color = vacc_status)) +
geom_line(size = 1) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Počet prvopředpisů na 1000 osob bez předpisu", limits = c(0, 5)) +
labs(
title = "Prvopředpisovost v čase podle vakcinace",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum",
color = "Vakcinace"
) +
theme_minimal()monthly_rates <- monthly_by_vacc %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID")) %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(first_presc_yearmonth >= as.Date("2022-01-01"), 1, 0),
interaction = vaccinated * post_covid)
did_model <- lm(log(rate * 1000) ~ vaccinated * post_covid, data = monthly_rates)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(rate * 1000) ~ vaccinated * post_covid, data = monthly_rates)
Residuals:
Min 1Q Median 3Q Max
-0.37499 -0.07649 -0.01110 0.06480 0.48692
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.026739 0.017329 59.249 < 2e-16 ***
vaccinated 0.133896 0.024507 5.464 1.7e-07 ***
post_covid 0.003442 0.032420 0.106 0.916
vaccinated:post_covid 0.059457 0.045849 1.297 0.197
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1342 on 164 degrees of freedom
Multiple R-squared: 0.2574, Adjusted R-squared: 0.2438
F-statistic: 18.95 on 3 and 164 DF, p-value: 1.338e-10
Zde je opět potvrzený trend vyšší míry oproti mladší populaci a také vyšší retence očkovaných jedinců.
presc_id_counts <- dbGetQuery(con,
"
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
2021 - c.birthyear AS age_2021,
p.date as presc_date,
p.presc_order,
'CPZP' AS ins_company,
YEAR(FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.presc_order ASC)) AS first_presc_year
FROM
cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN cpzp_prescriptions p ON c.client_id = p.client_id
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date = '9999-12-31'
AND 2021 - c.birthyear BETWEEN 30 AND 39
UNION
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
2021 - c.birthyear AS age_2021,
p.date as presc_date,
p.presc_order,
'OZP' AS ins_company,
YEAR(FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.presc_order ASC)) AS first_presc_year
FROM
ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN ozp_prescriptions p ON c.client_id = p.client_id
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date IS NULL
AND 2021 - c.birthyear BETWEEN 30 AND 39
")%>%
mutate(presc_year = as.integer(format(as.Date(presc_date), "%Y")),
vacc_status = ifelse(n_vacc > 0, "Vakcinace", "Bez vakcinace"))
monthly_unique_clients <- presc_id_counts %>%
mutate(yearmonth = floor_date(presc_date, unit = "month")) %>%
distinct(client_id, yearmonth, vacc_status) %>%
count(yearmonth, vacc_status, name = "n_unique_clients")
presc_years <- presc_id_counts %>%
mutate(
presc_year = year(presc_date)
) %>%
distinct(client_id, first_presc_year, presc_year, vacc_status) %>%
filter(first_presc_year >= 2017, first_presc_year <= 2023)
retention_data <- presc_years %>%
mutate(year_offset = presc_year - first_presc_year) %>%
filter(year_offset >= 0)
retention_summary <- retention_data %>%
group_by(first_presc_year, year_offset, vacc_status) %>%
summarise(n_clients = n_distinct(client_id), .groups = "drop")
cohort_sizes <- retention_data %>%
filter(year_offset == 0) %>%
group_by(first_presc_year, vacc_status) %>%
summarise(cohort_size = n_distinct(client_id), .groups = "drop")
retention_summary <- retention_summary %>%
left_join(cohort_sizes, by = c("first_presc_year", "vacc_status")) %>%
mutate(retention_rate = n_clients / cohort_size)
retention_summary_filtered <- retention_summary %>%
filter(year_offset > 0)ggplot(retention_summary_filtered, aes(x = year_offset, y = retention_rate, color = as.factor(first_presc_year))) +
geom_line(size = 1.2) +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_color_viridis_d(name = "Rok prvopředpisu") +
labs(
title = "Retence (meziroční předpisovost) podle vakcinace",
x = "Počet let od prvopředpisu",
y = "Míra multipředpisovosti"
) +
facet_wrap(~ vacc_status) +
theme_minimal(base_size = 13)Během porovnání obou období dochází k mírnému nárůstu. Zde je ale nutné zmínit, že samotný průměr míry multipředpisovosti není vhodnou metrikou a to z důvodu stoupavého trendu až do roku 2020 a poté spíše konstantního průběhu pro očkovanou populaci. Rozdíly mezi skupinami jsou opět nevýznamné.
multi_presc_yearly <- multi_presc %>%
mutate(year = year(date)) %>%
filter(year > 2017)
multi_clients_yearly <- multi_presc_yearly %>%
group_by(year, vacc_status, client_id) %>%
summarise(presc_count = n(), .groups = "drop") %>%
mutate(is_multi = presc_count > 1)
multi_summary_yearly <- multi_clients_yearly %>%
group_by(year, vacc_status) %>%
summarise(
total_clients = n(),
multi_clients = sum(is_multi),
multi_rate = (multi_clients / total_clients) * 1000,
.groups = "drop"
)
multi_summary_yearly <- multi_summary_yearly %>%
mutate(
period_group = case_when(
year < 2020 ~ "Pre-COVID",
year > 2021 ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
multi_summary_clean <- multi_summary_yearly %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
t_test_results <- multi_summary_clean %>%
group_by(vacc_status) %>%
summarise(
p_value = t.test(multi_rate ~ period_group)$p.value,
mean_before = mean(multi_rate[period_group == "Pre-COVID"]),
mean_after = mean(multi_rate[period_group == "Post-COVID"]),
rate = mean_after / mean_before,
.groups = "drop"
)t_test_results %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before, `Průměr po` = mean_after, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání roční multipředpisovosti v čase") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.2185289 | 346.9531 | 359.5639 | 1.036347 |
| Vakcinace | 0.1433689 | 336.7882 | 358.9081 | 1.065679 |
ggplot(multi_summary_yearly, aes(x = year, y = multi_rate, color = vacc_status)) +
geom_line(size = 0.6, alpha = 0.4) +
stat_smooth(method = "loess", se = FALSE, size = 1.2) +
annotate("rect",
xmin = 2020, xmax = 2021,
ymin = -Inf, ymax = Inf,
alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Roční multipředpisovost na 1000 předepsaných klientů") +
scale_x_continuous(breaks = unique(multi_summary_yearly$year)) +
labs(
title = "Roční multipředpisovost podle vakcinace",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Rok",
color = "Vakcinace"
) +
theme_minimal()did_data <- multi_summary_clean %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(year >= 2022, 1, 0),
interaction = vaccinated * post_covid
)
did_model <- lm(log(multi_rate) ~ vaccinated * post_covid, data = did_data)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(multi_rate) ~ vaccinated * post_covid, data = did_data)
Residuals:
1 2 3 4 5 6 7 8
-0.002338 -0.018123 0.002338 0.018123 0.013060 -0.005682 -0.013060 0.005682
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.84919 0.01158 504.933 9.23e-11 ***
vaccinated -0.02990 0.01638 -1.825 0.1421
post_covid 0.03562 0.01638 2.174 0.0954 .
vaccinated:post_covid 0.02814 0.02317 1.215 0.2913
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.01638 on 4 degrees of freedom
Multiple R-squared: 0.8446, Adjusted R-squared: 0.7281
F-statistic: 7.247 on 3 and 4 DF, p-value: 0.04286
Měsíční multipředpisovost vykazuje mírně klesavou tendenci v období po covidu u nevakcinované populace a stoupaní u vakcinované. Rozdíly uvnitř skupin i mezi skupinami jsou však nevýznamné.
t_test_results %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before, `Průměr po` = mean_after, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání měsíční multipředpisovosti v čase") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.6580879 | 152.3939 | 154.5434 | 1.014105 |
| Vakcinace | 0.7603065 | 153.0694 | 154.0509 | 1.006412 |
ggplot(multi_summary_monthly, aes(x = year_month, y = multi_rate, color = vacc_status)) +
geom_line(size = 0.5, alpha = 0.4) +
stat_smooth(method = "loess", se = FALSE, size = 1.2) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Měsíční multipředpisovost na 1000 předepsaných klientů") +
labs(
title = "Měsíční multipředpisovost podle vakcinačního statusu",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum",
color = "Vakcinace"
) +
theme_minimal()did_data <- multi_summary_clean %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(year_month >= as.Date("2022-01-01"), 1, 0),
interaction = vaccinated * post_covid
)
did_model <- lm(log(multi_rate) ~ vaccinated * post_covid, data = did_data)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(multi_rate) ~ vaccinated * post_covid, data = did_data)
Residuals:
Min 1Q Median 3Q Max
-0.264666 -0.053956 0.005046 0.062361 0.290939
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.019074 0.018752 267.662 <2e-16 ***
vaccinated 0.008983 0.026519 0.339 0.736
post_covid 0.017669 0.026519 0.666 0.507
vaccinated:post_covid -0.010698 0.037503 -0.285 0.776
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09186 on 92 degrees of freedom
Multiple R-squared: 0.005949, Adjusted R-squared: -0.02647
F-statistic: 0.1835 on 3 and 92 DF, p-value: 0.9073
Poměry způsobu podání se oproti mladším skupinám srovnaly a vykazují zhruba stejné hodnoty.
prescriptions <- prescriptions %>%
mutate(
drug_form_std = case_when(
is.na(drug_form) ~ NA_character_,
str_detect(drug_form, "tableta|Tableta") ~ "Tableta",
str_detect(drug_form, "tobolka|Tobolka") ~ "Tobolka",
str_detect(drug_form, "injekční|Injekční|infuzní|Infuzní") ~ "Injekce / Infuze",
str_detect(drug_form, "č[íi]pek|Čípek") ~ "Čípek",
str_detect(drug_form, "perorální roztok|Perorální roztok") ~ "Perorální roztok",
TRUE ~ "Other"
)
)
prescriptions <- prescriptions %>%
mutate(
year = year(presc_date)
)
ggplot(prescriptions %>% filter(!is.na(presc_date), !is.na(drug_form)), aes(x = factor(year), fill = drug_form_std)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(
title = "Poměr způsobu podání v čase",
x = "Rok",
y = "%",
fill = "Způsob podání"
) +
theme_minimal()client_yearly_counts <- prescriptions %>%
filter(!is.na(presc_date), !is.na(drug_form)) %>%
group_by(client_id, year, drug_form_std) %>%
summarise(presc_count = n(), .groups = "drop")
average_freq_per_year <- client_yearly_counts %>%
group_by(year, drug_form_std) %>%
summarise(avg_presc_per_client = mean(presc_count), .groups = "drop")
ggplot(average_freq_per_year, aes(x = factor(year), y = avg_presc_per_client, fill = drug_form_std)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Počet předpisů za rok na klienta dle způsobu podání",
x = "Rok",
y = "Průměrný počet předpisů na klienta",
fill = "Způsob podání"
) +
theme_minimal()Při vykreslení poměru injekcí a infuzí na tablety je zde opět zajímavý nárůst ve prospěch tablet ve druhé polovině roku 2021.
filtered_data <- prescriptions %>%
mutate(
drug_form_std = ifelse(is.na(drug_form_std), "Unknown", drug_form_std)
) %>%
filter(
drug_form_std %in% c("Injekce / Infuze", "Tableta"),
!is.na(vaccinated),
!is.na(presc_yearmonth)
)
form_vax_counts <- filtered_data %>%
group_by(presc_yearmonth, vaccinated, drug_form_std) %>%
summarise(n = n(), .groups = "drop")
month_totals <- filtered_data %>%
group_by(presc_yearmonth, vaccinated) %>%
summarise(total = n(), .groups = "drop")
form_vax_ratios <- form_vax_counts %>%
left_join(month_totals, by = c("presc_yearmonth", "vaccinated")) %>%
mutate(ratio = n / total)
form_vax_ratios <- form_vax_ratios %>%
mutate(group = paste(drug_form_std, ifelse(vaccinated == "Vakcinace", "Vakcinace", "Bez vakcinace"), sep = " - "))
ggplot(form_vax_ratios, aes(x = presc_yearmonth, y = ratio, color = group)) +
geom_line(size = 0.6) +
geom_smooth(se = FALSE, span = 0.3) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_x_date(date_labels = "%Y",
date_breaks = "1 year",
expand = c(0.01, 0)) +
labs(
title = "Poměr počtu předpisů podle způsobu podání a vakcinace",
x = "Datum",
y = "Poměr počtu předpisů",
color = "Forma podání + Vakcinace"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5))Zatímco prvopředpisovost významně stoupla, tak multipředpisovost zůstává v zásadě na stejné úrovni a v některých případech dokonce i klesá. Rozdíly mezi očkovanými a neočkovanými jedinci jsou zanedbatelné.
Trendy způsobu podání jsou v této skupině oproti mladším populacím ustálené s jediným výkyvem ve druhé polovině roku 2022.
Všeobecně je riziko prvopředpisu po vakcinaci vyšší oproti baseline
před očkováním. Z hlediska statistické významnosti lze ale s určitou
dávkou jistoty takto zhodnotit pouze období 0-29,
90-179 a 270-365.
combined_vaccinated_prescribed <- dbGetQuery(con,
"
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
'CPZP' as ins_company
FROM cpzp_vaccinations v
LEFT JOIN cpzp_clients c ON v.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN cpzp_prescriptions p ON c.client_id = p.client_id
AND p.presc_order = 1
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-06-15' - INTERVAL '7 days'
AND DATE '2021-06-15' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 16 AND 39
UNION
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
'OZP' AS ins_company
FROM OZP_vaccinations v
LEFT JOIN ozp_clients c ON v.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN ozp_prescriptions p ON c.client_id = p.client_id
AND p.presc_order = 1
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-06-15' - INTERVAL '7 days'
AND DATE '2021-06-15' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 16 AND 39
") %>%
mutate(client_id = row_number()) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_vaccinated <- combined_vaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_vaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_vaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_vaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 12816L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 12816, number of events= 2136
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 0.25395 1.28910 0.10263 2.474 0.0133 *
exposure_start_day2 0.11084 1.11722 0.08064 1.374 0.1693
exposure_start_day3 0.14328 1.15405 0.06749 2.123 0.0338 *
exposure_start_day4 0.10718 1.11313 0.06845 1.566 0.1174
exposure_start_day5 0.15386 1.16633 0.06556 2.347 0.0189 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 1.289 0.7757 1.0542 1.576
exposure_start_day2 1.117 0.8951 0.9539 1.309
exposure_start_day3 1.154 0.8665 1.0111 1.317
exposure_start_day4 1.113 0.8984 0.9734 1.273
exposure_start_day5 1.166 0.8574 1.0257 1.326
Concordance= 0.725 (se = 0.008 )
Likelihood ratio test= 12.52 on 5 df, p=0.03
Wald test = 12.65 on 5 df, p=0.03
Score (logrank) test = 12.69 on 5 df, p=0.03
I v případě neočkovaných jsou rizika po celou dobu relativně vyšší. Díky většímu množství dat je také možné zhodnotit tyto výsledky na vyšší hladině statistické významnosti.
combined_unvaccinated_prescribed <- dbGetQuery(con,
"
SELECT
p.client_id,
DATE '2021-06-15' AS vaccination_date,
p.date AS event_date,
'CPZP' as ins_company
FROM cpzp_prescriptions p
LEFT JOIN cpzp_clients c ON P.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-06-15'- INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-06-15' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 16 AND 39
AND p.date BETWEEN DATE '2021-06-15' - INTERVAL '1 year' AND DATE '2021-06-15' + INTERVAL '1 year'
AND p.presc_order = 1
AND c.n_vacc = 0
UNION
SELECT
p.client_id,
DATE '2021-06-15' AS vaccination_date,
p.date AS event_date,
'OZP' as ins_company
FROM ozp_prescriptions p
LEFT JOIN ozp_clients c ON P.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-06-15' - INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-06-15' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 16 AND 39
AND p.date BETWEEN DATE '2021-06-15' - INTERVAL '1 year' AND DATE '2021-06-15' + INTERVAL '1 year'
AND p.presc_order = 1
AND c.n_vacc = 0
") %>%
mutate(client_id = row_number()) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_unvaccinated <- combined_unvaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_unvaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_unvaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_unvaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 39084L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 39084, number of events= 6514
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 0.23835 1.26916 0.05838 4.083 4.45e-05 ***
exposure_start_day2 0.08343 1.08701 0.04608 1.810 0.07024 .
exposure_start_day3 0.08523 1.08897 0.03900 2.185 0.02886 *
exposure_start_day4 0.01957 1.01976 0.04003 0.489 0.62489
exposure_start_day5 0.11229 1.11884 0.03763 2.984 0.00285 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 1.269 0.7879 1.1319 1.423
exposure_start_day2 1.087 0.9200 0.9931 1.190
exposure_start_day3 1.089 0.9183 1.0088 1.175
exposure_start_day4 1.020 0.9806 0.9428 1.103
exposure_start_day5 1.119 0.8938 1.0393 1.204
Concordance= 0.731 (se = 0.005 )
Likelihood ratio test= 24.44 on 5 df, p=2e-04
Wald test = 25.16 on 5 df, p=1e-04
Score (logrank) test = 25.22 on 5 df, p=1e-04
Průběh rizik je pro obě skupiny v zásadě srovnatelný s vyšším stanoveným rizikem pro očkovanou populaci. Ani jeden z rozdílů však není statisticky signifikantní.
coef_unvax <- sccs_result_unvaccinated$coefficients
conf_unvax <- sccs_result_unvaccinated$conf.int
df_unvax <- data.frame(
term = rownames(coef_unvax),
logHR = coef_unvax[, "coef"],
SE = coef_unvax[, "se(coef)"],
HR = coef_unvax[, "exp(coef)"],
conf.low = conf_unvax[, "lower .95"],
conf.high = conf_unvax[, "upper .95"]
)
coef_vax <- sccs_result_vaccinated$coefficients
conf_vax <- sccs_result_vaccinated$conf.int
df_vax <- data.frame(
term = rownames(coef_vax),
logHR = coef_vax[, "coef"],
SE = coef_vax[, "se(coef)"],
HR = coef_vax[, "exp(coef)"],
conf.low = conf_vax[, "lower .95"],
conf.high = conf_vax[, "upper .95"]
)
df_unvax <- df_unvax %>% mutate(group = "Bez Vakcinace")
df_vax <- df_vax %>% mutate(group = "Vakcinace")
combined <- bind_rows(df_vax, df_unvax)
combined$period <- as.integer(gsub("exposure_start_day", "", combined$term))
start_days <- c(0, 30, 90, 180, 270)
end_days <- c(29, 89, 179, 269, Inf)
period_labels <- ifelse(
is.infinite(end_days),
paste0(start_days, "+"),
paste0(start_days, "-", end_days)
)
combined$period_label <- factor(combined$period,
levels = seq_along(period_labels),
labels = period_labels)
ggplot(combined, aes(x = period_label, y = HR, color = group)) +
geom_point(position = position_dodge(width = 0.6), size = 3) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2,
position = position_dodge(width = 0.6)) +
geom_hline(yintercept = 1, linetype = "dashed") +
labs(
x = "Období expozice (dny)",
y = "Poměr rizik (HR)",
title = "Porovnání SCCS: očkovaní vs neočkovaní"
) +
theme_minimal()vax_df <- combined %>%
filter(group == "Vakcinace") %>%
rename_with(~ paste0(., "_vax"), -c(period, period_label))
unvax_df <- combined %>%
filter(group == "Bez Vakcinace") %>%
rename_with(~ paste0(., "_unvax"), -c(period, period_label))
df_diff <- inner_join(vax_df, unvax_df, by = c("period", "period_label"))
df_diff <- df_diff %>%
mutate(
logHR_diff = logHR_vax - logHR_unvax,
se_diff = sqrt(SE_vax^2 + SE_unvax^2),
ci_low = logHR_diff - 1.96 * se_diff,
ci_high = logHR_diff + 1.96 * se_diff
)
ggplot(df_diff, aes(x = period_label, y = logHR_diff)) +
geom_point(color = "blue", size = 3) +
geom_errorbar(aes(ymin = ci_low, ymax = ci_high), width = 0.2, color = "blue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(
title = "Rozdíl v log(HR) (Očkovaní - Neočkovaní)",
x = "Období expozice",
y = "Rozdíl v log(HR)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))common_terms <- intersect(df_vax$term, df_unvax$term)
did_results <- data.frame(
term = character(),
period_label = character(),
logHR_vax = numeric(),
logHR_unvax = numeric(),
DiD_logHR = numeric(),
DiD_SE = numeric(),
Z = numeric(),
P_value = numeric(),
stringsAsFactors = FALSE
)
for (term in common_terms) {
logHR_vax <- df_vax %>% filter(term == !!term) %>% pull(logHR)
SE_vax <- df_vax %>% filter(term == !!term) %>% pull(SE)
logHR_unvax <- df_unvax %>% filter(term == !!term) %>% pull(logHR)
SE_unvax <- df_unvax %>% filter(term == !!term) %>% pull(SE)
DiD_logHR <- logHR_vax - logHR_unvax
DiD_SE <- sqrt(SE_vax^2 + SE_unvax^2)
Z <- DiD_logHR / DiD_SE
P_value <- 2 * pnorm(-abs(Z))
period_idx <- as.integer(gsub("exposure_start_day", "", term))
period_label <- period_labels[period_idx]
did_results <- rbind(did_results, data.frame(
term = term,
period_label = period_label,
logHR_vax = logHR_vax,
logHR_unvax = logHR_unvax,
DiD_logHR = DiD_logHR,
DiD_SE = DiD_SE,
Z = Z,
P_value = P_value
))
}
did_results %>%
transmute(Období = period_label, `log(HR) Vakcinace` = logHR_vax, `log(HR) Bez vakcinace` = logHR_unvax, `DiD log(HR)` = DiD_logHR, p_value = P_value) %>%
kable(format = "html", caption = "Porovnání rizik očkovaných a neočkovaných (SCCS)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Období | log(HR) Vakcinace | log(HR) Bez vakcinace | DiD log(HR) | p_value |
|---|---|---|---|---|
| 0-29 | 0.2539458 | 0.2383525 | 0.0155933 | 0.8949331 |
| 30-89 | 0.1108449 | 0.0834299 | 0.0274150 | 0.7678754 |
| 90-179 | 0.1432802 | 0.0852301 | 0.0580501 | 0.4564462 |
| 180-269 | 0.1071752 | 0.0195718 | 0.0876034 | 0.2692642 |
| 270+ | 0.1538623 | 0.1122887 | 0.0415736 | 0.5823525 |
Vyšší množství dat pro úhrn předpisů celkem umožňuje čistší pohled na zhodnocená rizika. Ta jsou stejně jako u prvopředpisů po celé sledované období vyšší než baseline.
combined_vaccinated_prescribed <- dbGetQuery(con,
"
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
d.drug_form,
'CPZP' as ins_company
FROM cpzp_vaccinations v
LEFT JOIN cpzp_clients c ON v.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN cpzp_prescriptions p ON c.client_id = p.client_id
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
LEFT JOIN cpzp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-06-15' - INTERVAL '7 days'
AND DATE '2021-06-15' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 16 AND 39
UNION
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
d.drug_form,
'OZP' AS ins_company
FROM OZP_vaccinations v
LEFT JOIN ozp_clients c ON v.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN ozp_prescriptions p ON c.client_id = p.client_id
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
LEFT JOIN ozp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-06-15' - INTERVAL '7 days'
AND DATE '2021-06-15' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 16 AND 39
") %>%
mutate(client_id = as.numeric(factor(client_id)),
drug_form = tolower(drug_form),
inj_or_inf = str_detect(drug_form, "injekční|infuzní"))
injectable_infusional <- combined_vaccinated_prescribed %>%
filter(inj_or_inf) %>%
arrange(client_id, event_date) %>%
group_by(client_id) %>%
mutate(
day_diff = as.numeric(difftime(event_date, lag(event_date, default = first(event_date)), units = "days")),
new_window = is.na(day_diff) | day_diff > 10,
window_id = cumsum(new_window)
) %>%
group_by(client_id, window_id) %>%
slice(1) %>%
ungroup()
other_drugs <- combined_vaccinated_prescribed %>%
filter(!inj_or_inf)
combined_vaccinated_prescribed <- bind_rows(injectable_infusional, other_drugs) %>%
arrange(client_id, event_date) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_vaccinated <- combined_vaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_vaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_vaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_vaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 33414L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 33414, number of events= 5569
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 0.09611 1.10088 0.06856 1.402 0.16094
exposure_start_day2 0.11954 1.12698 0.04995 2.393 0.01671 *
exposure_start_day3 0.11603 1.12303 0.04241 2.736 0.00622 **
exposure_start_day4 0.10327 1.10879 0.04262 2.423 0.01539 *
exposure_start_day5 0.27407 1.31531 0.03895 7.036 1.98e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 1.101 0.9084 0.9625 1.259
exposure_start_day2 1.127 0.8873 1.0219 1.243
exposure_start_day3 1.123 0.8904 1.0335 1.220
exposure_start_day4 1.109 0.9019 1.0199 1.205
exposure_start_day5 1.315 0.7603 1.2186 1.420
Concordance= 0.732 (se = 0.005 )
Likelihood ratio test= 50.28 on 5 df, p=1e-09
Wald test = 51.74 on 5 df, p=6e-10
Score (logrank) test = 51.97 on 5 df, p=5e-10
Je patrné že průběh rizik pro neočkované téměř kopíruje průběh rizik očkovaných jedinců.
combined_unvaccinated_prescribed <- dbGetQuery(con,
"
SELECT
p.client_id,
DATE '2021-06-15' AS vaccination_date,
p.date AS event_date,
d.drug_form,
'CPZP' as ins_company
FROM cpzp_prescriptions p
LEFT JOIN cpzp_clients c ON P.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
LEFT JOIN cpzp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-06-15' - INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-06-15' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 16 AND 39
AND p.date BETWEEN DATE '2021-06-15' - INTERVAL '1 year' AND DATE '2021-06-15' + INTERVAL '1 year'
AND c.n_vacc = 0
UNION
SELECT
p.client_id,
DATE '2021-06-15' AS vaccination_date,
p.date AS event_date,
d.drug_form,
'OZP' as ins_company
FROM ozp_prescriptions p
LEFT JOIN ozp_clients c ON P.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
LEFT JOIN ozp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-06-15' - INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-06-15' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 16 AND 39
AND p.date BETWEEN DATE '2021-06-15' - INTERVAL '1 year' AND DATE '2021-06-15' + INTERVAL '1 year'
AND c.n_vacc = 0
") %>%
mutate(client_id = as.numeric(factor(client_id)),
drug_form = tolower(drug_form),
inj_or_inf = str_detect(drug_form, "injekční|infuzní"))
injectable_infusional <- combined_unvaccinated_prescribed %>%
filter(inj_or_inf) %>%
arrange(client_id, event_date) %>%
group_by(client_id) %>%
mutate(
day_diff = as.numeric(difftime(event_date, lag(event_date, default = first(event_date)), units = "days")),
new_window = is.na(day_diff) | day_diff > 10,
window_id = cumsum(new_window)
) %>%
group_by(client_id, window_id) %>%
slice(1) %>%
ungroup()
other_drugs <- combined_unvaccinated_prescribed %>%
filter(!inj_or_inf)
combined_unvaccinated_prescribed <- bind_rows(injectable_infusional, other_drugs) %>%
arrange(client_id, event_date) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_unvaccinated <- combined_unvaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_unvaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_unvaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_unvaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 109116L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 109116, number of events= 18186
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 0.19387 1.21394 0.03608 5.374 7.71e-08 ***
exposure_start_day2 0.05440 1.05591 0.02826 1.925 0.054178 .
exposure_start_day3 0.14992 1.16174 0.02303 6.510 7.50e-11 ***
exposure_start_day4 0.08169 1.08512 0.02365 3.454 0.000552 ***
exposure_start_day5 0.19609 1.21663 0.02207 8.885 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 1.214 0.8238 1.131 1.303
exposure_start_day2 1.056 0.9471 0.999 1.116
exposure_start_day3 1.162 0.8608 1.110 1.215
exposure_start_day4 1.085 0.9216 1.036 1.137
exposure_start_day5 1.217 0.8219 1.165 1.270
Concordance= 0.731 (se = 0.003 )
Likelihood ratio test= 112 on 5 df, p=<2e-16
Wald test = 113.6 on 5 df, p=<2e-16
Score (logrank) test = 113.8 on 5 df, p=<2e-16
Porovnatelnost mezi skupinami je téměř nulová. Jediný rozdíl je v
období 270-365, kde je vyšší riziko pro očkované s
p-hodnotou Waldova testu těsně nad úrovní 0,08.
coef_unvax <- sccs_result_unvaccinated$coefficients
conf_unvax <- sccs_result_unvaccinated$conf.int
df_unvax <- data.frame(
term = rownames(coef_unvax),
logHR = coef_unvax[, "coef"],
SE = coef_unvax[, "se(coef)"],
HR = coef_unvax[, "exp(coef)"],
conf.low = conf_unvax[, "lower .95"],
conf.high = conf_unvax[, "upper .95"]
)
coef_vax <- sccs_result_vaccinated$coefficients
conf_vax <- sccs_result_vaccinated$conf.int
df_vax <- data.frame(
term = rownames(coef_vax),
logHR = coef_vax[, "coef"],
SE = coef_vax[, "se(coef)"],
HR = coef_vax[, "exp(coef)"],
conf.low = conf_vax[, "lower .95"],
conf.high = conf_vax[, "upper .95"]
)
df_unvax <- df_unvax %>% mutate(group = "Unvaccinated")
df_vax <- df_vax %>% mutate(group = "Vaccinated")
combined <- bind_rows(df_vax, df_unvax)
combined$period <- as.integer(gsub("exposure_start_day", "", combined$term))
start_days <- c(0, 30, 90, 180, 270)
end_days <- c(29, 89, 179, 269, Inf)
period_labels <- ifelse(
is.infinite(end_days),
paste0(start_days, "+"),
paste0(start_days, "-", end_days)
)
combined$period_label <- factor(combined$period,
levels = seq_along(period_labels),
labels = period_labels)
ggplot(combined, aes(x = period_label, y = HR, color = group)) +
geom_point(position = position_dodge(width = 0.6), size = 3) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2,
position = position_dodge(width = 0.6)) +
geom_hline(yintercept = 1, linetype = "dashed") +
labs(
x = "Období expozice (dny)",
y = "Poměr rizik (HR)",
title = "Porovnání SCCS: očkovaní vs neočkovaní"
) +
theme_minimal()vax_df <- combined %>%
filter(group == "Vaccinated") %>%
rename_with(~ paste0(., "_vax"), -c(period, period_label))
unvax_df <- combined %>%
filter(group == "Unvaccinated") %>%
rename_with(~ paste0(., "_unvax"), -c(period, period_label))
df_diff <- inner_join(vax_df, unvax_df, by = c("period", "period_label"))
df_diff <- df_diff %>%
mutate(
logHR_diff = logHR_vax - logHR_unvax,
se_diff = sqrt(SE_vax^2 + SE_unvax^2),
ci_low = logHR_diff - 1.96 * se_diff,
ci_high = logHR_diff + 1.96 * se_diff
)
ggplot(df_diff, aes(x = period_label, y = logHR_diff)) +
geom_point(color = "blue", size = 3) +
geom_errorbar(aes(ymin = ci_low, ymax = ci_high), width = 0.2, color = "blue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(
title = "Rozdíl v log(HR) (Očkovaní - Neočkovaní)",
x = "Období expozice",
y = "Rozdíl v log(HR)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))common_terms <- intersect(df_vax$term, df_unvax$term)
did_results <- data.frame(
term = character(),
period_label = character(),
logHR_vax = numeric(),
logHR_unvax = numeric(),
DiD_logHR = numeric(),
DiD_SE = numeric(),
Z = numeric(),
P_value = numeric(),
stringsAsFactors = FALSE
)
for (term in common_terms) {
logHR_vax <- df_vax %>% filter(term == !!term) %>% pull(logHR)
SE_vax <- df_vax %>% filter(term == !!term) %>% pull(SE)
logHR_unvax <- df_unvax %>% filter(term == !!term) %>% pull(logHR)
SE_unvax <- df_unvax %>% filter(term == !!term) %>% pull(SE)
DiD_logHR <- logHR_vax - logHR_unvax
DiD_SE <- sqrt(SE_vax^2 + SE_unvax^2)
Z <- DiD_logHR / DiD_SE
P_value <- 2 * pnorm(-abs(Z))
period_idx <- as.integer(gsub("exposure_start_day", "", term))
period_label <- period_labels[period_idx]
did_results <- rbind(did_results, data.frame(
term = term,
period_label = period_label,
logHR_vax = logHR_vax,
logHR_unvax = logHR_unvax,
DiD_logHR = DiD_logHR,
DiD_SE = DiD_SE,
Z = Z,
P_value = P_value
))
}
did_results %>%
transmute(Období = period_label, `log(HR) Vakcinace` = logHR_vax, `log(HR) Bez vakcinace` = logHR_unvax, `DiD log(HR)` = DiD_logHR, p_value = P_value) %>%
kable(format = "html", caption = "Porovnání rizik (SCCS) očkovaných a neočkovaných") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Období | log(HR) Vakcinace | log(HR) Bez vakcinace | DiD log(HR) | p_value |
|---|---|---|---|---|
| 0-29 | 0.0961135 | 0.1938689 | -0.0977554 | 0.2070137 |
| 30-89 | 0.1195438 | 0.0544028 | 0.0651410 | 0.2563519 |
| 90-179 | 0.1160288 | 0.1499191 | -0.0338902 | 0.4825014 |
| 180-269 | 0.1032718 | 0.0816914 | 0.0215803 | 0.6579593 |
| 270+ | 0.2740703 | 0.1960854 | 0.0779850 | 0.0815362 |
Krátkodobá rizika jsou ve všech případech vyšší oproti baseline a v
zásadě srovnatelná. Jediný částečně významný rozdíl je u celkového úhrnu
předpisů v období 270-365 s vyšším rizikem pro
očkované.
unique_clients <- prescriptions %>%
filter(!is.na(age_2021), !is.na(sex), !is.na(vaccinated)) %>%
distinct(client_id, .keep_all = TRUE)
age_data <- unique_clients %>%
group_by(vaccinated, age_2021) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(vaccinated) %>%
mutate(prop = count / sum(count),
variable = "Age",
value = as.character(age_2021)) %>%
ungroup()
sex_data <- unique_clients %>%
group_by(vaccinated, sex) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(vaccinated) %>%
mutate(prop = count / sum(count),
variable = "Sex",
value = sex) %>%
ungroup()Stáří populace a poměr dle pohlaví jsou v zásadě srovnatelné.
ggplot(unique_clients, aes(x = age_2021, fill = factor(vaccinated))) +
geom_histogram(aes(y = after_stat(density)), position = "identity", alpha = 0.4, binwidth = 1) +
scale_fill_brewer(palette = "Set1", name = "Vaccinated") +
scale_x_continuous(breaks = seq(floor(min(unique_clients$age)), ceiling(max(unique_clients$age)), by = 1)) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Věkové rozložení podle statusu vakcinace",
x = "Věk", y = "%",
fill = "Vakcinace"
) +
theme_minimal()ggplot(sex_data, aes(x = sex, y = prop, color = vaccinated, group = vaccinated)) +
geom_line(size = 1) +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Pohlaví podle vakcinace",
x = "Pohlaví",
y = "%",
color = "Vakcinace"
) +
theme_minimal()prescriptions_clean <- prescriptions %>%
filter(!is.na(presc_date)) %>%
mutate(presc_date = as.Date(presc_date)) %>%
arrange(presc_date) %>%
filter(presc_date >= as.Date("2015-01-01") & presc_date <= as.Date("2023-12-31")) %>%
mutate(count = 1)
results_vacc <- prescriptions_clean %>%
filter(!is.na(vaccinated)) %>%
group_by(vaccinated) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$vaccinated)
analyze_group(df = filter(prescriptions_clean, vaccinated == group_val),
group_id = group_val,
group_type = "Vaccination")
}) %>%
compact() %>%
bind_rows()
results_sex <- prescriptions_clean %>%
filter(!is.na(sex)) %>%
group_by(sex) %>%
group_split() %>%
map(~ {
group_val <- unique(.x$sex)
analyze_group(df = filter(prescriptions_clean, sex == group_val),
group_id = group_val,
group_type = "Sex")
}) %>%
compact() %>%
bind_rows()
results_all <- bind_rows(results_vacc, results_sex) %>%
arrange(p_value)
plot_data_all <- results_all %>%
unnest(plot_data, names_sep = "_")ETS model vykazuje odchylky napříč všemi skupinami s nejvýraznějším rozdílem u neočkované populace.
results_all %>%
mutate(Skupina = group, RMSE = rmse, rRMSE = rrmse) %>%
select(
Skupina, RMSE, rRMSE, t_stat, p_value
) %>%
kable(format = "html", caption = "Porovnání předpisovosti v čase (ETS)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Skupina | RMSE | rRMSE | t_stat | p_value |
|---|---|---|---|---|
| Bez vakcinace | 245.4343 | 0.1401934 | -5.0840885 | 0.0000063 |
| M | 336.2235 | 0.1120938 | -2.8601803 | 0.0062977 |
| Vakcinace | 704.0997 | 0.1028859 | -0.7368828 | 0.4648563 |
| F | 576.8500 | 0.1031065 | 0.0795498 | 0.9369331 |
ggplot(plot_data_all, aes(x = plot_data_month)) +
geom_line(aes(y = plot_data_prescriptions), color = "blue") +
geom_line(aes(y = plot_data_predicted), color = "red", linetype = "dashed") +
geom_vline(xintercept = as.Date("2020-01-01"), linetype = "dotted") +
facet_wrap(~ paste(plot_data_group_type, plot_data_group, sep = ": "), scales = "free_y") +
labs(
title = "Realita vs Předpověď (ETS)",
subtitle = "Modrá = Realita, Červená = Předpověď",
x = "Datum", y = "Počet předpisů"
) +
theme_minimal()first_presc <- dbGetQuery(con,
"
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
'CPZP' AS ins_company
FROM
cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN cpzp_prescriptions p ON c.client_id = p.client_id AND p.presc_order = 1
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date = '9999-12-31'
AND 2021 - c.birthyear BETWEEN 40 AND 59
UNION
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.date ASC) AS first_presc_date,
'OZP' AS ins_company
FROM
ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN ozp_prescriptions p ON c.client_id = p.client_id AND p.presc_order = 1
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date IS NULL
AND 2021 - c.birthyear BETWEEN 40 AND 59
") %>%
mutate(first_presc_yearmonth = floor_date(first_presc_date, unit = "month"),
sex = ifelse(sex == "Z", "F", sex),
vacc_status = ifelse(n_vacc > 0, "Vakcinace", "Bez vakcinace"),
age_2021 = 2021 - birthyear)
monthly_first_presc <- first_presc %>%
count(first_presc_yearmonth, name = "new_prescriptions") %>%
arrange(first_presc_yearmonth) %>%
mutate(
cum_new_presc = cumsum(new_prescriptions),
total_population = nrow(first_presc),
at_risk_population = total_population - lag(cum_new_presc, default = 0),
rate = new_prescriptions / at_risk_population
) %>%
mutate(
period_group = case_when(
first_presc_yearmonth < as.Date("2020-01-01") ~ "Pre-COVID",
first_presc_yearmonth > as.Date("2021-12-31") ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
monthly_clean <- monthly_first_presc %>%
filter(first_presc_yearmonth >= as.Date("2018-01-01")) %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
ttest <- t.test(rate ~ period_group, data = monthly_clean)Průměrná prvopředpisovost ve srovnání po vs před je mírně nižší. Samotný rozdíl průměrů však není statisticky významný.
ggplot(monthly_first_presc %>% filter(first_presc_yearmonth >= as.Date("2018-01-01")),
aes(x = first_presc_yearmonth, y = rate * 1000)) +
geom_line(color = "steelblue", size = 1) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Počet prvopředpisů na 1000 osob bez předpisu", limits = c(0, 10)) +
labs(
title = "Prvopředpisovost v čase",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum"
) +
theme_minimal()
Welch Two Sample t-test
data: rate by period_group
t = -1.1168, df = 45.53, p-value = 0.27
alternative hypothesis: true difference in means between group Post-COVID and group Pre-COVID is not equal to 0
95 percent confidence interval:
-0.0004915077 0.0001407987
sample estimates:
mean in group Post-COVID mean in group Pre-COVID
0.005124781 0.005300136
total_by_vacc <- first_presc %>%
count(vacc_status, name = "total_population")
monthly_by_vacc <- first_presc %>%
count(vacc_status, first_presc_yearmonth, name = "new_prescriptions") %>%
arrange(vacc_status, first_presc_yearmonth) %>%
group_by(vacc_status) %>%
mutate(
cum_prescriptions = cumsum(new_prescriptions),
total_population = total_by_vacc$total_population[match(vacc_status, total_by_vacc$vacc_status)],
at_risk_population = total_population - lag(cum_prescriptions, default = 0),
rate = new_prescriptions / at_risk_population
) %>%
ungroup() %>%
mutate(
period_group = case_when(
first_presc_yearmonth < as.Date("2020-01-01") ~ "Pre-COVID",
first_presc_yearmonth > as.Date("2021-12-31") ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
monthly_by_vacc_clean <- monthly_by_vacc %>%
filter(first_presc_yearmonth >= as.Date("2018-01-01")) %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
first_presc_table <- monthly_by_vacc_clean %>%
group_by(vacc_status) %>%
summarise(
p_value = t.test(rate ~ period_group)$p.value,
mean_before = mean(rate[period_group == "Pre-COVID"]),
mean_after = mean(rate[period_group == "Post-COVID"]),
rate = mean_after / mean_before
)Obě skupiny vykazují mírné snížení v prvopředpisovosti. Rozdíl mezi skupinami je nevýznamný.
first_presc_table %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before * 1000, `Průměr po` = mean_after * 1000, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání prvopředpisovosti v čase podle vakcinace") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.0606129 | 4.325463 | 4.072212 | 0.9414511 |
| Vakcinace | 0.4742458 | 5.642638 | 5.518322 | 0.9779684 |
ggplot(monthly_by_vacc %>% filter(first_presc_yearmonth >= as.Date("2018-01-01")),
aes(x = first_presc_yearmonth, y = rate * 1000, color = vacc_status)) +
geom_line(size = 1) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Počet předpisů na 1000 osob bez předpisu", limits = c(0, 10)) +
labs(
title = "Prvopředpisovost v čase podle vakcinace",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum",
color = "Vakcinace"
) +
theme_minimal()monthly_rates <- monthly_by_vacc %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID")) %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(first_presc_yearmonth >= as.Date("2022-01-01"), 1, 0),
interaction = vaccinated * post_covid)
did_model <- lm(log(rate * 1000) ~ vaccinated * post_covid, data = monthly_rates)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(rate * 1000) ~ vaccinated * post_covid, data = monthly_rates)
Residuals:
Min 1Q Median 3Q Max
-0.35508 -0.10601 -0.02006 0.08872 0.75335
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.58083 0.02292 68.980 < 2e-16 ***
vaccinated 0.24642 0.03241 7.603 2.12e-12 ***
post_covid -0.18320 0.04287 -4.273 3.26e-05 ***
vaccinated:post_covid 0.05736 0.06063 0.946 0.345
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1775 on 164 degrees of freedom
Multiple R-squared: 0.4203, Adjusted R-squared: 0.4097
F-statistic: 39.64 on 3 and 164 DF, p-value: < 2.2e-16
presc_id_counts <- dbGetQuery(con,
"
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
2021 - c.birthyear AS age_2021,
p.date as presc_date,
p.presc_order,
'CPZP' AS ins_company,
YEAR(FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.presc_order ASC)) AS first_presc_year
FROM
cpzp_clients c
LEFT JOIN cpzp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN cpzp_prescriptions p ON c.client_id = p.client_id
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date = '9999-12-31'
AND 2021 - c.birthyear BETWEEN 40 AND 59
UNION
SELECT
c.client_id,
c.n_vacc,
c.n_presc,
c.sex,
c.birthyear,
2021 - c.birthyear AS age_2021,
p.date as presc_date,
p.presc_order,
'OZP' AS ins_company,
YEAR(FIRST(p.date) OVER (PARTITION BY c.client_id ORDER BY p.presc_order ASC)) AS first_presc_year
FROM
ozp_clients c
LEFT JOIN ozp_ins_start_end i ON c.client_id = i.client_id
LEFT JOIN ozp_prescriptions p ON c.client_id = p.client_id
WHERE
i.ins_start_date < '2015-01-01' AND i.ins_end_date IS NULL
AND 2021 - c.birthyear BETWEEN 40 AND 59
")%>%
mutate(presc_year = as.integer(format(as.Date(presc_date), "%Y")),
vacc_status = ifelse(n_vacc > 0, "Vakcinace", "Bez vakcinace"))
monthly_unique_clients <- presc_id_counts %>%
mutate(yearmonth = floor_date(presc_date, unit = "month")) %>%
distinct(client_id, yearmonth, vacc_status) %>%
count(yearmonth, vacc_status, name = "n_unique_clients")
presc_years <- presc_id_counts %>%
mutate(
presc_year = year(presc_date)
) %>%
distinct(client_id, first_presc_year, presc_year, vacc_status) %>%
filter(first_presc_year >= 2017, first_presc_year <= 2023)
retention_data <- presc_years %>%
mutate(year_offset = presc_year - first_presc_year) %>%
filter(year_offset >= 0)
retention_summary <- retention_data %>%
group_by(first_presc_year, year_offset, vacc_status) %>%
summarise(n_clients = n_distinct(client_id), .groups = "drop")
cohort_sizes <- retention_data %>%
filter(year_offset == 0) %>%
group_by(first_presc_year, vacc_status) %>%
summarise(cohort_size = n_distinct(client_id), .groups = "drop")
retention_summary <- retention_summary %>%
left_join(cohort_sizes, by = c("first_presc_year", "vacc_status")) %>%
mutate(retention_rate = n_clients / cohort_size)
retention_summary_filtered <- retention_summary %>%
filter(year_offset > 0)Míra meziročních předpisů stále potvrzuje trend z předchozích sledování. Je vyšší než u mladší populace a míra pro očkované je všeobecně vyšší.
ggplot(retention_summary_filtered, aes(x = year_offset, y = retention_rate, color = as.factor(first_presc_year))) +
geom_line(size = 1.2) +
geom_point(size = 2) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_color_viridis_d(name = "Rok prvopředpisu") +
labs(
title = "Retence (meziroční předpisovost) podle vakcinace",
x = "Počet let od prvopředpisu",
y = "Míra multipředpisovosti"
) +
facet_wrap(~ vacc_status) +
theme_minimal(base_size = 13)multi_presc_yearly <- multi_presc %>%
mutate(year = year(date)) %>%
filter(year > 2017)
multi_clients_yearly <- multi_presc_yearly %>%
group_by(year, vacc_status, client_id) %>%
summarise(presc_count = n(), .groups = "drop") %>%
mutate(is_multi = presc_count > 1)
multi_summary_yearly <- multi_clients_yearly %>%
group_by(year, vacc_status) %>%
summarise(
total_clients = n(),
multi_clients = sum(is_multi),
multi_rate = (multi_clients / total_clients) * 1000,
.groups = "drop"
)
multi_summary_yearly <- multi_summary_yearly %>%
mutate(
period_group = case_when(
year < 2020 ~ "Pre-COVID",
year > 2021 ~ "Post-COVID",
TRUE ~ "COVID-Shock"
)
)
multi_summary_clean <- multi_summary_yearly %>%
filter(period_group %in% c("Pre-COVID", "Post-COVID"))
t_test_results <- multi_summary_clean %>%
group_by(vacc_status) %>%
summarise(
p_value = t.test(multi_rate ~ period_group)$p.value,
mean_before = mean(multi_rate[period_group == "Pre-COVID"]),
mean_after = mean(multi_rate[period_group == "Post-COVID"]),
rate = mean_after / mean_before,
.groups = "drop"
)Roční multipředpisovost stoupá v obou kategoriích. Míra změny mezi skupinami je nevýznamná.
t_test_results %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before, `Průměr po` = mean_after, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání roční multipředpisovosti v čase") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.0892644 | 409.2620 | 434.4393 | 1.061519 |
| Vakcinace | 0.1074908 | 419.2872 | 438.8464 | 1.046649 |
ggplot(multi_summary_yearly, aes(x = year, y = multi_rate, color = vacc_status)) +
geom_line(size = 0.6, alpha = 0.4) +
stat_smooth(method = "loess", se = FALSE, size = 1.2) +
annotate("rect",
xmin = 2020, xmax = 2021,
ymin = -Inf, ymax = Inf,
alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Roční multipředpisovost na 1000 předepsaných klientů") +
scale_x_continuous(breaks = unique(multi_summary_yearly$year)) +
labs(
title = "Roční multipředpisovost podle vakcinace",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Rok",
color = "Vakcinace"
) +
theme_minimal()did_data <- multi_summary_clean %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(year >= 2022, 1, 0),
interaction = vaccinated * post_covid
)
did_model <- lm(log(multi_rate) ~ vaccinated * post_covid, data = did_data)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(multi_rate) ~ vaccinated * post_covid, data = did_data)
Residuals:
1 2 3 4 5 6 7 8
-0.014567 -0.012360 0.014567 0.012360 -0.008741 -0.007212 0.008741 0.007212
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.01425 0.01111 541.532 6.98e-11 ***
vaccinated 0.02423 0.01571 1.543 0.198
post_covid 0.05977 0.01571 3.805 0.019 *
vaccinated:post_covid -0.01412 0.02221 -0.636 0.559
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.01571 on 4 degrees of freedom
Multiple R-squared: 0.8636, Adjusted R-squared: 0.7612
F-statistic: 8.439 on 3 and 4 DF, p-value: 0.03328
Míra měsíční multipředpisovosti vykazuje zajímavé rozdělení trendů opačnými směry v období covidu, kdy všeobecné riziko bylo u vakcinovaných jedinců menší. Tento rozdíl je statisticky významný.
t_test_results %>%
transmute(Vakcinace = vacc_status, p_value, `Průměr před` = mean_before, `Průměr po` = mean_after, `Poměr` = rate) %>%
kable(format = "html", caption = "Porovnání měsíční multipředpisovosti v čase") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Vakcinace | p_value | Průměr před | Průměr po | Poměr |
|---|---|---|---|---|
| Bez vakcinace | 0.0002998 | 151.3781 | 162.6159 | 1.074237 |
| Vakcinace | 0.5848585 | 150.4276 | 151.5545 | 1.007491 |
ggplot(multi_summary_monthly, aes(x = year_month, y = multi_rate, color = vacc_status)) +
geom_line(size = 0.5, alpha = 0.4) +
stat_smooth(method = "loess", se = FALSE, size = 1.2) +
annotate("rect",
xmin = as.Date("2020-01-01"), xmax = as.Date("2021-12-31"),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = "gray50") +
scale_y_continuous(name = "Měsíční multipředpisovost na 1000 předepsaných klientů") +
labs(
title = "Měsíční multipředpisovost podle vakcinačního statusu",
subtitle = "Šedá = covidové období (vyřazeno z t-testu)",
x = "Datum",
color = "Vakcinace"
) +
theme_minimal()did_data <- multi_summary_clean %>%
mutate(
vaccinated = if_else(vacc_status == "Vakcinace", 1, 0),
post_covid = if_else(year_month >= as.Date("2022-01-01"), 1, 0),
interaction = vaccinated * post_covid
)
did_model <- lm(log(multi_rate) ~ vaccinated * post_covid, data = did_data)
cat("```\n", paste(capture.output(summary(did_model)), collapse = "\n"), "\n```")
Call:
lm(formula = log(multi_rate) ~ vaccinated * post_covid, data = did_data)
Residuals:
Min 1Q Median 3Q Max
-0.122492 -0.038151 -0.003402 0.035210 0.183638
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.01774 0.01129 444.295 < 2e-16 ***
vaccinated -0.00548 0.01597 -0.343 0.73229
post_covid 0.07192 0.01597 4.503 1.96e-05 ***
vaccinated:post_covid -0.06415 0.02259 -2.840 0.00555 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.05533 on 92 degrees of freedom
Multiple R-squared: 0.2555, Adjusted R-squared: 0.2312
F-statistic: 10.52 on 3 and 92 DF, p-value: 5.127e-06
Trendy nastavené v předchozích kohortách jsou v zásadě nezměněné i ve starší populaci.
prescriptions <- prescriptions %>%
mutate(
drug_form_std = case_when(
is.na(drug_form) ~ NA_character_,
str_detect(drug_form, "tableta|Tableta") ~ "Tableta",
str_detect(drug_form, "tobolka|Tobolka") ~ "Tobolka",
str_detect(drug_form, "injekční|Injekční|infuzní|Infuzní") ~ "Injekce / Infuze",
str_detect(drug_form, "č[íi]pek|Čípek") ~ "Čípek",
str_detect(drug_form, "perorální roztok|Perorální roztok") ~ "Perorální roztok",
TRUE ~ "Other"
)
)
prescriptions <- prescriptions %>%
mutate(
year = year(presc_date)
)
ggplot(prescriptions %>% filter(!is.na(presc_date), !is.na(drug_form)), aes(x = factor(year), fill = drug_form_std)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(
title = "Poměr způsobu podání v čase",
x = "Rok",
y = "%",
fill = "Způsob podání"
) +
theme_minimal()client_yearly_counts <- prescriptions %>%
filter(!is.na(presc_date), !is.na(drug_form)) %>%
group_by(client_id, year, drug_form_std) %>%
summarise(presc_count = n(), .groups = "drop")
average_freq_per_year <- client_yearly_counts %>%
group_by(year, drug_form_std) %>%
summarise(avg_presc_per_client = mean(presc_count), .groups = "drop")
ggplot(average_freq_per_year, aes(x = factor(year), y = avg_presc_per_client, fill = drug_form_std)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Počet předpisů za rok na klienta dle způsobu podání",
x = "Rok",
y = "Průměrný počet předpisů na klienta",
fill = "Způsob podání"
) +
theme_minimal()Výkyv v poměru předpisů ve druhé polovině roku 2022 je v případě této kohorty nejvyšší.
filtered_data <- prescriptions %>%
mutate(
drug_form_std = ifelse(is.na(drug_form_std), "Unknown", drug_form_std)
) %>%
filter(
drug_form_std %in% c("Injekce / Infuze", "Tableta"),
!is.na(vaccinated),
!is.na(presc_yearmonth)
)
form_vax_counts <- filtered_data %>%
group_by(presc_yearmonth, vaccinated, drug_form_std) %>%
summarise(n = n(), .groups = "drop")
month_totals <- filtered_data %>%
group_by(presc_yearmonth, vaccinated) %>%
summarise(total = n(), .groups = "drop")
form_vax_ratios <- form_vax_counts %>%
left_join(month_totals, by = c("presc_yearmonth", "vaccinated")) %>%
mutate(ratio = n / total)
form_vax_ratios <- form_vax_ratios %>%
mutate(group = paste(drug_form_std, ifelse(vaccinated == "Vakcinace", "Vakcinace", "Bez vakcinace"), sep = " - "))
ggplot(form_vax_ratios, aes(x = presc_yearmonth, y = ratio, color = group)) +
geom_line(size = 0.6) +
geom_smooth(se = FALSE, span = 0.3) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_x_date(date_labels = "%Y",
date_breaks = "1 year",
expand = c(0.01, 0)) +
labs(
title = "Poměr počtu předpisů podle způsobu podání a vakcinace",
x = "Datum",
y = "Poměr počtu předpisů",
color = "Forma podání + Vakcinace"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5))Riziko prvopředpisů u očkovaných je v zásadě nižší oproti baseline.
Statisticky významné výsledky jsou však jen v časovém období
30-89 a 180-269.
combined_vaccinated_prescribed <- dbGetQuery(con,
"
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
'CPZP' as ins_company
FROM cpzp_vaccinations v
LEFT JOIN cpzp_clients c ON v.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN cpzp_prescriptions p ON c.client_id = p.client_id
AND p.presc_order = 1
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-05-21' - INTERVAL '7 days'
AND DATE '2021-05-21' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 40 AND 59
UNION
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
'OZP' AS ins_company
FROM OZP_vaccinations v
LEFT JOIN ozp_clients c ON v.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN ozp_prescriptions p ON c.client_id = p.client_id
AND p.presc_order = 1
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-05-21' - INTERVAL '7 days'
AND DATE '2021-05-21' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 40 AND 59
") %>%
mutate(client_id = row_number()) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_vaccinated <- combined_vaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_vaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_vaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_vaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 34968L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 34968, number of events= 5828
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 -0.07643 0.92642 0.06796 -1.125 0.26074
exposure_start_day2 -0.28572 0.75147 0.05452 -5.241 1.6e-07 ***
exposure_start_day3 -0.05380 0.94762 0.04154 -1.295 0.19525
exposure_start_day4 -0.10989 0.89593 0.04250 -2.586 0.00972 **
exposure_start_day5 -0.04944 0.95176 0.04040 -1.224 0.22107
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 0.9264 1.079 0.8109 1.0584
exposure_start_day2 0.7515 1.331 0.6753 0.8362
exposure_start_day3 0.9476 1.055 0.8735 1.0280
exposure_start_day4 0.8959 1.116 0.8243 0.9738
exposure_start_day5 0.9518 1.051 0.8793 1.0302
Concordance= 0.763 (se = 0.005 )
Likelihood ratio test= 32.93 on 5 df, p=4e-06
Wald test = 31.21 on 5 df, p=9e-06
Score (logrank) test = 31.36 on 5 df, p=8e-06
Zde je zajímavá změna trendu, kdy riziko neočkovaných v období
180-269 mírně přesáhne baseline.
combined_unvaccinated_prescribed <- dbGetQuery(con,
"
SELECT
p.client_id,
DATE '2021-05-21' AS vaccination_date,
p.date AS event_date,
'CPZP' as ins_company
FROM cpzp_prescriptions p
LEFT JOIN cpzp_clients c ON P.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-05-21'- INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-05-21' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 40 AND 59
AND p.date BETWEEN DATE '2021-05-21' - INTERVAL '1 year' AND DATE '2021-05-21' + INTERVAL '1 year'
AND p.presc_order = 1
AND c.n_vacc = 0
UNION
SELECT
p.client_id,
DATE '2021-05-21' AS vaccination_date,
p.date AS event_date,
'OZP' as ins_company
FROM ozp_prescriptions p
LEFT JOIN ozp_clients c ON P.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-05-21' - INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-05-21' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 40 AND 59
AND p.date BETWEEN DATE '2021-05-21' - INTERVAL '1 year' AND DATE '2021-05-21' + INTERVAL '1 year'
AND p.presc_order = 1
AND c.n_vacc = 0
") %>%
mutate(client_id = row_number()) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_unvaccinated <- combined_unvaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_unvaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_unvaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_unvaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 45756L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 45756, number of events= 7626
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 -0.03966 0.96111 0.05977 -0.664 0.50696
exposure_start_day2 -0.14062 0.86882 0.04575 -3.074 0.00211 **
exposure_start_day3 0.03271 1.03325 0.03589 0.911 0.36215
exposure_start_day4 0.06801 1.07038 0.03539 1.922 0.05465 .
exposure_start_day5 -0.03389 0.96668 0.03592 -0.943 0.34552
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 0.9611 1.0405 0.8549 1.0806
exposure_start_day2 0.8688 1.1510 0.7943 0.9503
exposure_start_day3 1.0332 0.9678 0.9631 1.1086
exposure_start_day4 1.0704 0.9342 0.9986 1.1473
exposure_start_day5 0.9667 1.0345 0.9010 1.0372
Concordance= 0.749 (se = 0.004 )
Likelihood ratio test= 18.4 on 5 df, p=0.002
Wald test = 18.04 on 5 df, p=0.003
Score (logrank) test = 18.07 on 5 df, p=0.003
Při porovnání rizik je všeobecně méně riziková skupina ta očkovaná
(která ani v jednom případě nepřesáhne úroveň zvýšeného rizika oproti
baseline). Statisticky významné rozdíly jsou v období 30-89
a 180-269, kdy jsou rizika očkovaných významně menší.
coef_unvax <- sccs_result_unvaccinated$coefficients
conf_unvax <- sccs_result_unvaccinated$conf.int
df_unvax <- data.frame(
term = rownames(coef_unvax),
logHR = coef_unvax[, "coef"],
SE = coef_unvax[, "se(coef)"],
HR = coef_unvax[, "exp(coef)"],
conf.low = conf_unvax[, "lower .95"],
conf.high = conf_unvax[, "upper .95"]
)
coef_vax <- sccs_result_vaccinated$coefficients
conf_vax <- sccs_result_vaccinated$conf.int
df_vax <- data.frame(
term = rownames(coef_vax),
logHR = coef_vax[, "coef"],
SE = coef_vax[, "se(coef)"],
HR = coef_vax[, "exp(coef)"],
conf.low = conf_vax[, "lower .95"],
conf.high = conf_vax[, "upper .95"]
)
df_unvax <- df_unvax %>% mutate(group = "Bez Vakcinace")
df_vax <- df_vax %>% mutate(group = "Vakcinace")
combined <- bind_rows(df_vax, df_unvax)
combined$period <- as.integer(gsub("exposure_start_day", "", combined$term))
start_days <- c(0, 30, 90, 180, 270)
end_days <- c(29, 89, 179, 269, Inf)
period_labels <- ifelse(
is.infinite(end_days),
paste0(start_days, "+"),
paste0(start_days, "-", end_days)
)
combined$period_label <- factor(combined$period,
levels = seq_along(period_labels),
labels = period_labels)
ggplot(combined, aes(x = period_label, y = HR, color = group)) +
geom_point(position = position_dodge(width = 0.6), size = 3) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2,
position = position_dodge(width = 0.6)) +
geom_hline(yintercept = 1, linetype = "dashed") +
labs(
x = "Období expozice (dny)",
y = "Poměr rizik (HR)",
title = "Porovnání SCCS: očkovaní vs neočkovaní"
) +
theme_minimal()vax_df <- combined %>%
filter(group == "Vakcinace") %>%
rename_with(~ paste0(., "_vax"), -c(period, period_label))
unvax_df <- combined %>%
filter(group == "Bez Vakcinace") %>%
rename_with(~ paste0(., "_unvax"), -c(period, period_label))
df_diff <- inner_join(vax_df, unvax_df, by = c("period", "period_label"))
df_diff <- df_diff %>%
mutate(
logHR_diff = logHR_vax - logHR_unvax,
se_diff = sqrt(SE_vax^2 + SE_unvax^2),
ci_low = logHR_diff - 1.96 * se_diff,
ci_high = logHR_diff + 1.96 * se_diff
)
ggplot(df_diff, aes(x = period_label, y = logHR_diff)) +
geom_point(color = "blue", size = 3) +
geom_errorbar(aes(ymin = ci_low, ymax = ci_high), width = 0.2, color = "blue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(
title = "Rozdíl v log(HR) (Očkovaní - Neočkovaní)",
x = "Období expozice",
y = "Rozdíl v log(HR)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))common_terms <- intersect(df_vax$term, df_unvax$term)
did_results <- data.frame(
term = character(),
period_label = character(),
logHR_vax = numeric(),
logHR_unvax = numeric(),
DiD_logHR = numeric(),
DiD_SE = numeric(),
Z = numeric(),
P_value = numeric(),
stringsAsFactors = FALSE
)
for (term in common_terms) {
logHR_vax <- df_vax %>% filter(term == !!term) %>% pull(logHR)
SE_vax <- df_vax %>% filter(term == !!term) %>% pull(SE)
logHR_unvax <- df_unvax %>% filter(term == !!term) %>% pull(logHR)
SE_unvax <- df_unvax %>% filter(term == !!term) %>% pull(SE)
DiD_logHR <- logHR_vax - logHR_unvax
DiD_SE <- sqrt(SE_vax^2 + SE_unvax^2)
Z <- DiD_logHR / DiD_SE
P_value <- 2 * pnorm(-abs(Z))
period_idx <- as.integer(gsub("exposure_start_day", "", term))
period_label <- period_labels[period_idx]
did_results <- rbind(did_results, data.frame(
term = term,
period_label = period_label,
logHR_vax = logHR_vax,
logHR_unvax = logHR_unvax,
DiD_logHR = DiD_logHR,
DiD_SE = DiD_SE,
Z = Z,
P_value = P_value
))
}
did_results %>%
transmute(Období = period_label, `log(HR) Vakcinace` = logHR_vax, `log(HR) Bez vakcinace` = logHR_unvax, `DiD log(HR)` = DiD_logHR, p_value = P_value) %>%
kable(format = "html", caption = "Porovnání rizik očkovaných a neočkovaných (SCCS)") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Období | log(HR) Vakcinace | log(HR) Bez vakcinace | DiD log(HR) | p_value |
|---|---|---|---|---|
| 0-29 | -0.0764318 | -0.0396637 | -0.0367680 | 0.6845618 |
| 30-89 | -0.2857232 | -0.1406190 | -0.1451042 | 0.0414702 |
| 90-179 | -0.0538000 | 0.0327083 | -0.0865082 | 0.1150647 |
| 180-269 | -0.1098894 | 0.0680120 | -0.1779014 | 0.0012968 |
| 270+ | -0.0494389 | -0.0338858 | -0.0155531 | 0.7735826 |
Díky opravdu velkému množství dat jsou veškeré výsledky statisticky významné. Riziko jako takové kolísá nejprve snížením pod úroveň baseline a poté postupným navyšováním s vrcholem na konci sledovaného období.
combined_vaccinated_prescribed <- dbGetQuery(con,
"
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
d.drug_form,
'CPZP' as ins_company
FROM cpzp_vaccinations v
LEFT JOIN cpzp_clients c ON v.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN cpzp_prescriptions p ON c.client_id = p.client_id
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
LEFT JOIN cpzp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-05-21' - INTERVAL '7 days'
AND DATE '2021-05-21' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 40 AND 59
UNION
SELECT
v.client_id,
v.date AS vaccination_date,
p.date AS event_date,
d.drug_form,
'OZP' AS ins_company
FROM OZP_vaccinations v
LEFT JOIN ozp_clients c ON v.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
INNER JOIN ozp_prescriptions p ON c.client_id = p.client_id
AND p.date BETWEEN v.date - INTERVAL '1 year' AND v.date + INTERVAL '1 year'
LEFT JOIN ozp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE v.event_order = 1
AND s.ins_start_date < v.date - INTERVAL '1 year'
AND s.ins_end_date > v.date + INTERVAL '1 year'
AND v.date BETWEEN DATE '2021-05-21' - INTERVAL '7 days'
AND DATE '2021-05-21' + INTERVAL '7 days'
AND 2021 - c.birthyear BETWEEN 40 AND 59
") %>%
mutate(client_id = as.numeric(factor(client_id)),
drug_form = tolower(drug_form),
inj_or_inf = str_detect(drug_form, "injekční|infuzní"))
injectable_infusional <- combined_vaccinated_prescribed %>%
filter(inj_or_inf) %>%
arrange(client_id, event_date) %>%
group_by(client_id) %>%
mutate(
day_diff = as.numeric(difftime(event_date, lag(event_date, default = first(event_date)), units = "days")),
new_window = is.na(day_diff) | day_diff > 10,
window_id = cumsum(new_window)
) %>%
group_by(client_id, window_id) %>%
slice(1) %>%
ungroup()
other_drugs <- combined_vaccinated_prescribed %>%
filter(!inj_or_inf)
combined_vaccinated_prescribed <- bind_rows(injectable_infusional, other_drugs) %>%
arrange(client_id, event_date) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_vaccinated <- combined_vaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_vaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_vaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_vaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 183642L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 183642, number of events= 30607
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 0.07770 1.08080 0.02882 2.696 0.007011 **
exposure_start_day2 -0.14287 0.86687 0.02331 -6.130 8.82e-10 ***
exposure_start_day3 0.06440 1.06651 0.01804 3.570 0.000357 ***
exposure_start_day4 0.08328 1.08685 0.01791 4.651 3.30e-06 ***
exposure_start_day5 0.16960 1.18483 0.01689 10.043 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 1.0808 0.9252 1.0214 1.1436
exposure_start_day2 0.8669 1.1536 0.8282 0.9074
exposure_start_day3 1.0665 0.9376 1.0295 1.1049
exposure_start_day4 1.0868 0.9201 1.0494 1.1257
exposure_start_day5 1.1848 0.8440 1.1463 1.2247
Concordance= 0.744 (se = 0.002 )
Likelihood ratio test= 183.4 on 5 df, p=<2e-16
Wald test = 182.6 on 5 df, p=<2e-16
Score (logrank) test = 183.2 on 5 df, p=<2e-16
Trend neočkovaných v zásadě kopíruje trend vakcinované skupiny a to i v ohledu významnosti výsledků.
combined_unvaccinated_prescribed <- dbGetQuery(con,
"
SELECT
p.client_id,
DATE '2021-05-21' AS vaccination_date,
p.date AS event_date,
d.drug_form,
'CPZP' as ins_company
FROM cpzp_prescriptions p
LEFT JOIN cpzp_clients c ON P.client_id = c.client_id
LEFT JOIN cpzp_ins_start_end s ON c.client_id = s.client_id
LEFT JOIN cpzp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-05-21' - INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-05-21' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 40 AND 59
AND p.date BETWEEN DATE '2021-05-21' - INTERVAL '1 year' AND DATE '2021-05-21' + INTERVAL '1 year'
AND c.n_vacc = 0
UNION
SELECT
p.client_id,
DATE '2021-05-21' AS vaccination_date,
p.date AS event_date,
d.drug_form,
'OZP' as ins_company
FROM ozp_prescriptions p
LEFT JOIN ozp_clients c ON P.client_id = c.client_id
LEFT JOIN ozp_ins_start_end s ON c.client_id = s.client_id
LEFT JOIN ozp_drugs_catalog d ON p.presc_id = d.presc_id
WHERE 1 = 1
AND s.ins_start_date < DATE '2021-05-21' - INTERVAL '1 year'
AND s.ins_end_date > DATE '2021-05-21' + INTERVAL '1 year'
AND 2021 - c.birthyear BETWEEN 40 AND 59
AND p.date BETWEEN DATE '2021-05-21' - INTERVAL '1 year' AND DATE '2021-05-21' + INTERVAL '1 year'
AND c.n_vacc = 0
") %>%
mutate(client_id = as.numeric(factor(client_id)),
drug_form = tolower(drug_form),
inj_or_inf = str_detect(drug_form, "injekční|infuzní"))
injectable_infusional <- combined_unvaccinated_prescribed %>%
filter(inj_or_inf) %>%
arrange(client_id, event_date) %>%
group_by(client_id) %>%
mutate(
day_diff = as.numeric(difftime(event_date, lag(event_date, default = first(event_date)), units = "days")),
new_window = is.na(day_diff) | day_diff > 10,
window_id = cumsum(new_window)
) %>%
group_by(client_id, window_id) %>%
slice(1) %>%
ungroup()
other_drugs <- combined_unvaccinated_prescribed %>%
filter(!inj_or_inf)
combined_unvaccinated_prescribed <- bind_rows(injectable_infusional, other_drugs) %>%
arrange(client_id, event_date) %>%
transmute(
person_id = client_id,
exposure_start_date = vaccination_date,
event_date,
observation_start_date = vaccination_date - days(365),
observation_end_date = vaccination_date + days(365)
)
sccs_unvaccinated <- combined_unvaccinated_prescribed %>%
mutate(
event_day = as.integer(event_date - observation_start_date),
exposure_start_day = as.integer(exposure_start_date - observation_start_date),
observation_start_day = 0,
observation_end_day = as.integer(observation_end_date - observation_start_date)
)
sccs_result_unvaccinated <- standardsccs(
formula = event_day ~ exposure_start_day,
indiv = person_id,
astart = observation_start_day,
aend = observation_end_day,
aevent = event_day,
adrug = exposure_start_day,
aedrug = exposure_start_day + 365,
expogrp = list(c(0, 30, 90, 180, 270)),
data = sccs_unvaccinated
)
cat("```\n", paste(capture.output(print(sccs_result_unvaccinated)), collapse = "\n"), "\n```") Call:
coxph(formula = Surv(rep(1, 236910L), event) ~ exposure_start_day +
strata(indivL) + offset(log(interval)), data = chopdat, method = "exact")
n= 236910, number of events= 39485
coef exp(coef) se(coef) z Pr(>|z|)
exposure_start_day1 0.04053 1.04136 0.02593 1.563 0.1181
exposure_start_day2 -0.06283 0.93910 0.01991 -3.155 0.0016 **
exposure_start_day3 0.06723 1.06954 0.01594 4.217 2.47e-05 ***
exposure_start_day4 0.11704 1.12417 0.01563 7.487 7.02e-14 ***
exposure_start_day5 0.17413 1.19021 0.01491 11.675 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
exposure_start_day1 1.0414 0.9603 0.9898 1.0957
exposure_start_day2 0.9391 1.0649 0.9032 0.9765
exposure_start_day3 1.0695 0.9350 1.0366 1.1035
exposure_start_day4 1.1242 0.8895 1.0902 1.1591
exposure_start_day5 1.1902 0.8402 1.1559 1.2255
Concordance= 0.742 (se = 0.002 )
Likelihood ratio test= 197.1 on 5 df, p=<2e-16
Wald test = 199.9 on 5 df, p=<2e-16
Score (logrank) test = 200.3 on 5 df, p=<2e-16
Vzhledem k tomu, že trendy jsou v zásadě srovnatelné, není možné po
většinu sledovaného období jednoznačně vyvodit vyšší riziko pro jednu
nebo druhou skupinu. Jediný rozdíl je v období 30-89, kdy
je riziko významně nižší pro očkovanou populaci.
coef_unvax <- sccs_result_unvaccinated$coefficients
conf_unvax <- sccs_result_unvaccinated$conf.int
df_unvax <- data.frame(
term = rownames(coef_unvax),
logHR = coef_unvax[, "coef"],
SE = coef_unvax[, "se(coef)"],
HR = coef_unvax[, "exp(coef)"],
conf.low = conf_unvax[, "lower .95"],
conf.high = conf_unvax[, "upper .95"]
)
coef_vax <- sccs_result_vaccinated$coefficients
conf_vax <- sccs_result_vaccinated$conf.int
df_vax <- data.frame(
term = rownames(coef_vax),
logHR = coef_vax[, "coef"],
SE = coef_vax[, "se(coef)"],
HR = coef_vax[, "exp(coef)"],
conf.low = conf_vax[, "lower .95"],
conf.high = conf_vax[, "upper .95"]
)
df_unvax <- df_unvax %>% mutate(group = "Unvaccinated")
df_vax <- df_vax %>% mutate(group = "Vaccinated")
combined <- bind_rows(df_vax, df_unvax)
combined$period <- as.integer(gsub("exposure_start_day", "", combined$term))
start_days <- c(0, 30, 90, 180, 270)
end_days <- c(29, 89, 179, 269, Inf)
period_labels <- ifelse(
is.infinite(end_days),
paste0(start_days, "+"),
paste0(start_days, "-", end_days)
)
combined$period_label <- factor(combined$period,
levels = seq_along(period_labels),
labels = period_labels)
ggplot(combined, aes(x = period_label, y = HR, color = group)) +
geom_point(position = position_dodge(width = 0.6), size = 3) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2,
position = position_dodge(width = 0.6)) +
geom_hline(yintercept = 1, linetype = "dashed") +
labs(
x = "Období expozice (dny)",
y = "Poměr rizik (HR)",
title = "Porovnání SCCS: očkovaní vs neočkovaní"
) +
theme_minimal()vax_df <- combined %>%
filter(group == "Vaccinated") %>%
rename_with(~ paste0(., "_vax"), -c(period, period_label))
unvax_df <- combined %>%
filter(group == "Unvaccinated") %>%
rename_with(~ paste0(., "_unvax"), -c(period, period_label))
df_diff <- inner_join(vax_df, unvax_df, by = c("period", "period_label"))
df_diff <- df_diff %>%
mutate(
logHR_diff = logHR_vax - logHR_unvax,
se_diff = sqrt(SE_vax^2 + SE_unvax^2),
ci_low = logHR_diff - 1.96 * se_diff,
ci_high = logHR_diff + 1.96 * se_diff
)
ggplot(df_diff, aes(x = period_label, y = logHR_diff)) +
geom_point(color = "blue", size = 3) +
geom_errorbar(aes(ymin = ci_low, ymax = ci_high), width = 0.2, color = "blue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(
title = "Rozdíl v log(HR) (Očkovaní - Neočkovaní)",
x = "Období expozice",
y = "Rozdíl v log(HR)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))common_terms <- intersect(df_vax$term, df_unvax$term)
did_results <- data.frame(
term = character(),
period_label = character(),
logHR_vax = numeric(),
logHR_unvax = numeric(),
DiD_logHR = numeric(),
DiD_SE = numeric(),
Z = numeric(),
P_value = numeric(),
stringsAsFactors = FALSE
)
for (term in common_terms) {
logHR_vax <- df_vax %>% filter(term == !!term) %>% pull(logHR)
SE_vax <- df_vax %>% filter(term == !!term) %>% pull(SE)
logHR_unvax <- df_unvax %>% filter(term == !!term) %>% pull(logHR)
SE_unvax <- df_unvax %>% filter(term == !!term) %>% pull(SE)
DiD_logHR <- logHR_vax - logHR_unvax
DiD_SE <- sqrt(SE_vax^2 + SE_unvax^2)
Z <- DiD_logHR / DiD_SE
P_value <- 2 * pnorm(-abs(Z))
period_idx <- as.integer(gsub("exposure_start_day", "", term))
period_label <- period_labels[period_idx]
did_results <- rbind(did_results, data.frame(
term = term,
period_label = period_label,
logHR_vax = logHR_vax,
logHR_unvax = logHR_unvax,
DiD_logHR = DiD_logHR,
DiD_SE = DiD_SE,
Z = Z,
P_value = P_value
))
}
did_results %>%
transmute(Období = period_label, `log(HR) Vakcinace` = logHR_vax, `log(HR) Bez vakcinace` = logHR_unvax, `DiD log(HR)` = DiD_logHR, p_value = P_value) %>%
kable(format = "html", caption = "Porovnání rizik (SCCS) očkovaných a neočkovaných") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)| Období | log(HR) Vakcinace | log(HR) Bez vakcinace | DiD log(HR) | p_value |
|---|---|---|---|---|
| 0-29 | 0.0777049 | 0.0405313 | 0.0371735 | 0.3376538 |
| 30-89 | -0.1428676 | -0.0628347 | -0.0800329 | 0.0090377 |
| 90-179 | 0.0643952 | 0.0672268 | -0.0028315 | 0.9063690 |
| 180-269 | 0.0832830 | 0.1170410 | -0.0337580 | 0.1555280 |
| 270+ | 0.1696027 | 0.1741257 | -0.0045229 | 0.8408985 |
Oproti ostatním kohortám dochází ve věkové kategorii
40-59 k mírnému propadu v prvopředpisovosti, kde mezi
jednotlivými skupinami není významný rozdíl. Naopak měsíční
multipředpisovost vykazuje nepatrný nárůst se statisticky signifikantním
rozdílem mezi skupinami (konkrétně téměř nulový nárůst u očkovaných a
zhruba 7% nárůst u neočkované populace.).
Trendy ve formě podání jsou téměř konstantní s jedinou výjimkou opět ve druhé polovině roku 2021.
Z hlediska krátkodobých rizik se trendy obou skupin víceméně kopírují s několika málo významnými odchylkami, kde je nižší riziko vypočtené u očkovaných.
Všeobecně je možné konstatovat, že riziko předpisu kortikoidsteroidů během covidu a v období po covidu rostlo a neustále roste. Co se týče vlivu očkování na tento trend, tak ve většině případů není možné zhodnotit rozdíly mezi skupinami jako významné. V několika případech, kdy se riziko liší, je vakcinovaná populace vyhodnocená jako méně riziková s jedinou výjimkou u rizik celkového počtu předpisů 270-365 dní po vakcinaci pro kohortu ve věku 16-39 let.
V zásadě je také možné říct, že rizika se týkala zejména mladší populace. V případě starší populace sledujeme spíše konstantní rizika a nebo v některých případech dokonce nepatrné snížení.
Ve zkratce tedy:
Změnila se v období pandemie nějak celková spotřeba kortikoidů? Ano. Nejprve došlo k očekávatelnému propadu a poté došlo k navýšení, které přesahuje trend nastavený předcovidovým obdobím.
Jsou trendy týkající se spotřeby kortikoidů stejné u očkované i neočkované populace? Chránily tedy vakcíny před nárůstem spotřeby kortikoidů, neměly žádný efekt, nebo dokonce škodily? Trendy jsou v zásadě srovnatelné a v případech, kdy tomu tak není, je vakcinovaná populace vyhodnocená jako s nižším rizikem.
Je rozdíl mezi jednotlivými věkovými skupinami? Ano. K největším rozdílům oproti celkovému trendu dochází u mladší populace. Je však otázka zdali to není způsobené prostým stárnutím dané kohorty.
Je důležité zmínit, že samotné zvýšení počtu předpisů nemusí nutně znamenat zhoršení situace. Vzhledem k tomu, že se kortikoidy předepisují na celou škálu různých problémů a každý problém má rozdílné dávkovací schéma, může jednoduše docházet i k nárůstu diagnóz s nutností častějšího (nebo sezónního) dávkování. Vzhledem k naší odbornosti by ale bylo případné zhodnocení spíše spekulací.
Zajímavé bylo pozorovat výrazný rozdíl v poměru předepsaných injekcí / infuzí na tablety u neočkované populace ve druhé polovině roku 2022. Došlo zde totiž k výraznému snížení počtu injekcí / infuzí a zároveň i zvýšení tablet. Zdali se jedná o výsledky covidových nařízení nebo jiné důvody opět přenecháme erudovanějším osobám.
Během zkoumání bylo (přirozeně) nalezeno mnoho slepých uliček, ale i zajímavých dat, která nejsou v tomto textu z kapacitních důvodů uvedena. Zejména data týkající se multipředpisovosti mohou ukázat mnoho dalších směrů k bádání. Pro příklad: průměrné intervaly mezi jednotlivými předpisy na omezeném intervalu 0-365, míra rizika druhého předpisu po prvním předpisu v čase, vliv incidence COVID-19 na předpisovost kortikoidů, průměrný počet předpisů na multipředepisovaného klienta, změny v sezónnosti předpisů a mnoho dalších.